首页> 外文学位 >Comparing score trends on high-stakes and low-stakes tests using metric -free statistics and multidimensional item response models.

【24h】

Comparing score trends on high-stakes and low-stakes tests using metric -free statistics and multidimensional item response models.

机译：使用无度量统计和多维项目响应模型比较高风险和低风险测试的分数趋势。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The most widely interpreted large-scale educational statistic is the test score trend. Positive trends are interpreted as an improvement in the education of students, as an increase in student learning, and as evidence of educational policies functioning as intended. An implicit assumption of this attention to test score trends is that they can be generalized to trends for other tests that measure the "same" desired learning outcomes. However, comparing trends across testing programs is not straightforward, nor are discrepancies readily interpretable when they are found.;The first half of this dissertation develops methodology for comparing trends across tests with different score scales. These chapters present and implement a "metric-free" framework that provides graphs and statistics that are independent of the test score scale. These methods allow comparisons of "high-stakes" state test score trends with trends for "low-stakes" tests such as the National Assessment of Educational Progress (NAEP). Results show that score trend discrepancies are widespread, and that average high-stakes test score trends are significantly more positive than their NAEP counterparts for the same state, subject, and grade combinations. These results cast doubt on common interpretations of high-stakes test score trends without offering any footholds for further interpretations.;The second half of this dissertation develops methodology to explain score trend discrepancies as a consequence of overlapping but not identical test content. In other words, where trend discrepancies arise, trends for overlapping content strands should be similar, while trends for nonoverlapping content areas should account for observed discrepancies. Multidimensional Item Response Models include ability or proficiency parameters for multiple dimensions or cognitive skills, allowing detailed descriptions of proficiency that may be glossed over by unidimensional models. These chapters develop a Markov Chain Monte Carlo-based estimation procedure for a confirmatory, 3-parameter logistic model. This model is used to estimate subscale trends for a high-stakes Reading test in a mid-sized state. Results suggest that the model estimation procedures are sound, but that the model cannot account for score trend discrepancies in this state. However, these methods are shown to have great potential for resolving the dissonance that trend discrepancies present.

机译：解释最广泛的大规模教育统计数据是考试分数趋势。积极的趋势可以解释为学生教育水平的提高，学生学习水平的提高以及教育政策按预期发挥作用的证据。对考试成绩趋势的这种关注的一个隐含假设是，可以将其推广到其他衡量“相同”所需学习成果的考试的趋势。但是，比较测试程序之间的趋势并不简单，发现差异时也难以解释。本论文的前半部分提出了一种方法，用于比较具有不同评分标准的测试趋势。这些章节介绍并实现了一个“无度量”框架，该框架提供了独立于测试分数等级的图表和统计信息。这些方法可以将“高风险”状态考试分数趋势与“低风险”考试趋势进行比较，例如国家教育进步评估（NAEP）。结果表明，对于相同的州，科目和年级组合，分数趋势差异普遍存在，并且平均高风险测试分数趋势明显比其NAEP同行更为积极。这些结果使人们对高风险测试成绩趋势的常见解释产生了疑问，而没有为进一步的解释提供立足之地。本论文的下半部分开发了一种方法来解释由于考试内容重叠但不相同而导致的成绩趋势差异。换句话说，在出现趋势差异的地方，重叠内容链的趋势应该相似，而对于不重叠内容区域的趋势应考虑观察到的差异。多维项目响应模型包括用于多维或认知技能的能力或熟练程度参数，从而可以通过一维模型来掩盖熟练程度的详细描述。这些章节开发了基于马尔可夫链蒙特卡罗的估计程序，用于验证性，三参数逻辑模型。该模型用于估计中型状态下高风险阅读测验的分量表趋势。结果表明，模型估计过程是合理的，但是该模型无法解决这种状态下的得分趋势差异。但是，这些方法显示出解决趋势差异存在的不一致性的巨大潜力。

著录项

作者
Ho, Andrew Dean.;
展开▼
作者单位

Stanford University.;

展开▼
授予单位 Stanford University.;
学科 Educational tests measurements.
学位 Ph.D.
年度 2005
页码 121 p.
总页数 121
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Response distortion on personality tests in applicants: comparing high-stakes to low-stakes medical settings [J] . Anglim Jeromy, Bozic Stefan, Little Jonathon, Advances in health sciences education: theory and practice . 2018,第2期

机译：申请人的个性测试的反应失真：将高赌注与低赌注医疗设置相比
2. Using Patient Health Questionnaire-9 item parameters of a common metric resulted in similar depression scores compared to independent item response theory model reestimation [J] . Liegl Gregor, Wahl Inka, Berghoefer Anne, Journal of Clinical Epidemiology . 2016,第Null期

机译：与独立项目响应理论模型的重新估计相比，使用患者健康问卷9的一项通用指标的参数导致相似的抑郁评分
3. Using Patient Health Questionnaire-9 item parameters of a common metric resulted in similar depression scores compared to independent item response theory model reestimation [J] . Liegl Gregor, Wahl Inka, Berghoefer Anne, Journal of Clinical Epidemiology . 2016,第Null期

机译：使用患者健康调查问卷-9常见度量的9项参数，导致与独立物品响应理论模型重新定期相比相似的抑郁分数
4. Comparing Item Selection Criteria in Multidimensional Computerized Adaptive Testing for Two Item Response Theory Models [C] . Ziwen Ye, Jianan Sun International Conference on Computational Intelligence and Applications . 2018

机译：两种项目响应理论模型的多维计算机自适应测试中项目选择标准的比较
5. Goodness-of-fit statistics for compensatory multidimensional item response models using total scores. [D] . Zhang, Bo. 2003

机译：使用总分的补偿性多维项目响应模型的拟合优度统计。
6. Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items [O] . Dongbo Tu, Yuting Han, Yan Cai, 2018

机译：多维计分项目的多维计算机自适应测试中的项目选择方法
7. Response Distortion on Personality Tests in Applicants: Comparing High-Stakes to Low-Stakes Medical Settings [O] . Jeromy Anglim, Stefan Bozic, Jonathon Little, 2017

机译：申请人的个性测试对响应失真：将高赌注与低赌注医疗环境进行比较
8. Computer Programs for Scoring Test Data with Item Characteristic Curve Models. [R] . Bejar, I. I., Weiss, D. J. 1979

机译：用项目特征曲线模型评分测试数据的计算机程序。

Comparing score trends on high-stakes and low-stakes tests using metric -free statistics and multidimensional item response models.

摘要

著录项

相似文献

相关主题

期刊订阅