首页> 外文OA文献 >Exploring the item difficulty and other psychometric properties of the core perceptual, verbal, and working memory subtests of the WAIS-IV using item response theory

【2h】

Exploring the item difficulty and other psychometric properties of the core perceptual, verbal, and working memory subtests of the WAIS-IV using item response theory

机译：使用项目反应理论探索WAIS-IV的核心知觉，言语和工作记忆子测验的项目难度和其他心理计量特性

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ceiling and basal rules of the Wechsler Adult Intelligence Scale – Fourth Edition (WAIS-IV; Wechsler, 2008) only function as intended if subtest items proceed in order of difficulty. While many aspects of the WAIS-IV have been researched, there is no literature about subtest item difficulty and precise item difficulty values are not available. The WAIS-IV was developed within the framework of Classical Test Theory (CTT) and item difficulty was most often determined using p-values. One limitation of this method is that item difficulty values are sample dependent. Both standard error of measurement, an important indicator of reliability, and p-values change when the sample changes. A different framework within which psychological tests can be created, analyzed and refined is called Item Response Theory (IRT). IRT places items and person ability onto the same scale using linear transformations and links item difficulty level to person ability. As a result, IRT is said to be produce sample-independent statistics. Rasch modeling, a form of IRT, is one parameter logistic model that is appropriate for items with only two response options and assumes that the only factors affecting test performance are characteristics of items, such as their difficulty level or their relationship to the construct being measured by the test, and characteristics of participants, such as their ability levels. The partial credit model is similar to the standard dichotomous Rasch model, except that it is appropriate for items with more than two response options. Proponents of standard dichotomous Rasch model argue that it has distinct advantages above both CTT-based methods as well as other IRT models (Bond u26 Fox, 2007; Embretson u26 Reise, 2000; Furr u26 Bacharach, 2013; Hambleton u26 Jones, 1993) because of the principle of monotonicity, also referred to as specific objectivity, the principle of additivity or double cancellation, which “establishes that two parameters are additively related to a third variable” (Embretson u26 Reise, 2000, p. 148). In other words, because of the principle of monotonicity, in Rasch modeling, probability of correctly answering an item is the additive function of individuals’ ability, or trait level, and the item’s degree of difficulty. As ability increases, so does an individual’s probability of answering that item. Because only item difficulty and person ability affect an individual’s chance of correctly answering an item, inter-individual comparisons can be made even if individuals did not receive identical items or items of the same difficulty level. This is why Rasch modeling is referred to as a test-free measurement. The purpose of this study was to apply a standard dichotomous Rasch model or partial credit model to the individual items of seven core perceptual, verbal and working memory subtests of the WAIS-IV: Block Design, Matrix Reasoning, Visual Puzzles, Similarities, Vocabulary, Information, Arithmetic Digits Forward, Digits Backward and Digit Sequencing. Results revealed that WAIS-IV subtests fall into one of three categories: optimally ordered, near optimally ordered and sub-optimally ordered. Optimally ordered subtests, Digits Forward and Digits Backward, had no disordered items. Near optimally ordered subtests were those with one to three disordered items and included Digit Sequencing, Arithmetic, Similarities and Block Design. Sub-optimally ordered subtests consisted of Matrix Reasoning, Visual Puzzles, Information and Vocabulary, with the number of disordered items ranging from six to 16. Two major implications of the result of this study were considered: the impact on individuals’ scores and the impact on overall test administration time. While the number of disordered items ranged from 0 to 16, the overall impact on raw scores was deemed minimal. Because of where the disordered items occur in the subtest, most individuals are administered all the items that they would be expected to answer correctly. A one-point reduction in any one subtest is unlikely to significantly affect overall index scores, which are the scores most commonly interpreted in the WAIS-IV. However, if an individual received a one-point reduction across all subtests, this may have a more noticeable impact on index scores. In cases where individuals discontinue before having a chance to answer items that were easier, clinicians may consider testing the limits. While this would have no impact on raw scores, it may provide clinicians with a better understanding of individuals’ true abilities. Based on the findings of this study, clinicians may consider administering only certain items in order to test the limits, based on the items’ difficulty value. This study found that the start point for most subtests is too easy for most individuals. For some subtests, most individuals may be administered more than 10 items that are too easy for them. Other than increasing overall administration time, it is not clear what impact, of any, this has. However, it does suggest the need to reevaluate current start items so that they are the true basal for most people. Future studies should break standard test administration by ignoring basal and ceiling rules to collect data on more items. In order to help clarify why some items are more or less difficult than would be expected given their ordinal rank, future studies should include a qualitative aspect, where, after each subtest, individuals are asked describe what they found easy and difficult about each item. Finally, future research should examine the effects of item ordering on participant performance. While this study revealed that only minimal reductions in index scores likely result from the prematurely stopping test administration, it is not known if disordering has other impacts on performance, perhaps by increasing or decreasing an individual’s confidence.

机译：韦氏成人智力量表第四版（WAIS-IV；韦氏，2008）的上限和基础规则仅在子测验项目按照难度顺序进行时才按预期发挥作用。虽然已经研究了WAIS-IV的许多方面，但没有关于子测验项目难度的文献，也没有精确的项目难度值。 WAIS-IV是在经典测试理论（CTT）框架内开发的，项目难度通常是使用p值确定的。该方法的局限性在于，项目难度值取决于样本。当样品变化时，标准的测量误差（可靠性的重要指标）和p值都会变化。可以创建，分析和完善心理测验的另一个框架称为项目反应理论（IRT）。 IRT使用线性变换将项目和人员能力置于相同的比例，并将项目难度级别与人员能力相关联。结果，IRT被认为是独立于样本的统计数据。 Rasch建模是IRT的一种形式，它是一种参数逻辑模型，适用于只有两个响应选项的项目，并假设影响测试性能的唯一因素是项目的特征，例如其难度级别或与被测结构的关系通过测试以及参与者的特征，例如他们的能力水平。局部信用模型类似于标准的二分Rasch模型，不同之处在于它适用于具有两个以上响应选项的项目。支持标准二分法Rasch模型的人认为，与基于CTT的方法以及其他IRT模型相比，它具有明显的优势（Bond u26 Fox，2007； Embretson u26 Reise，2000； Furr u26 Bacharach，2013； Hambleton u26 Jones （1993年），因为单调性原则（也称为特定客观性），可加性或双重抵消原则，即“确定两个参数与第三个变量累加相关”（Embretson u26 Reise，2000，第148页））。换句话说，由于单调性的原理，在Rasch建模中，正确回答某项问题的可能性是个人能力或特质水平以及该项目难易程度的累加函数。随着能力的提高，个人回答该项目的可能性也会增加。因为只有项目难度和人的能力会影响个人正确回答项目的机会，所以即使个人没有收到相同项目或相同难度级别的项目，也可以进行个体间比较。这就是为什么Rasch建模被称为免测试测量的原因。这项研究的目的是将标准的二分Rasch模型或部分信用模型应用于WAIS-IV的七个核心知觉，言语和工作记忆子测验的各个项目：块设计，矩阵推理，视觉难题，相似性，词汇，信息，算术数字前进，数字后退和数字排序。结果显示，WAIS-IV子测试属于以下三类之一：最佳排序，接近最佳排序和次最佳排序。最佳排序的子测试，“向前数字”和“向后数字”没有混乱的项目。接近最佳排序的子测试是那些包含一到三个无序项目的子测试，其中包括数字排序，算术，相似性和模块设计。次优排序的子测验由矩阵推理，视觉谜题，信息和词汇组成，无序题的数量在6到16之间。考虑了这项研究结果的两个主要含义：对个人分数的影响和影响总体测试管理时间。尽管无序项目的数量从0到16不等，但对原始分数的总体影响被认为是最小的。由于在子测验中出现乱序的项目，大多数个人都被管理了所有可以正确回答的项目。任何一项子测试中的单点降低都不太可能显着影响整体指数得分，这是WAIS-IV中最常解释的得分。但是，如果某人在所有子测验中获得了1分的降低，这可能会对指数得分产生更明显的影响。如果个人在有机会回答更容易解决的问题之前停药，临床医生可以考虑测试极限值。虽然这不会影响原始分数，但可以使临床医生更好地了解个人的真实能力。根据这项研究的结果，临床医生可能会考虑根据某些物品的难度值仅管理某些物品以测试限量。这项研究发现，对于大多数个人而言，大多数子测验的起点太容易了。对于某些子测试，大多数个人可能会被管理超过10个项目，这对他们来说太容易了。除了增加整体管理时间，尚不清楚这会产生什么影响。但是，它确实建议需要重新评估当前的开始项目，以便它们成为大多数人的真正基础。未来的研究应该通过忽略基础规则和上限规则来收集更多项目的数据，从而打破标准的考试管理。为了帮助弄清为什么有些项目比按顺序排列的项目难度要大或小，未来的研究应包括定性方面，在每个子测试之后，要求个人描述他们发现每个项目的难点和难点。最后，未来的研究应检查项目订购对参与者绩效的影响。尽管这项研究表明，过早停止考试管理可能只会使指数得分降低最小，但尚不清楚乱序是否会对表现产生其他影响，可能是通过增加或降低个人的自信心。

著录项

作者
Schleicher-Dilks Sara Ann;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Evaluation of psychometric properties and differential item functioning of 8-item Child Perceptions Questionnaires using item response theory [J] . David TW Yau, May CM Wong, KF Lam, BMC Public Health . 2015,第1期

机译：项目反应理论评价8项儿童知觉问卷的心理计量学特性和项目功能的差异
2. Psychometric Properties of the Wisconsin Schizotypy Scales in an Undergraduate Sample: Classical Test Theory, Item Response Theory, and Differential Item Functioning [J] . Beate P. Winterstein, Terry A. Ackerman, Paul J. Silvia, Journal of Psychopathology and Behavioral Assessment . 2011,第4期

机译：威斯康星大学本科生样本的心理计量学性质：古典测验理论，项目反应理论和差异项目功能
3. On the relationship between differential item functioning and item difficulty: An issue of methods? item response theory approach to differential item functioning [J] . Santelices M.V., Wilson M. Educational and Psychological Measurement . 2012,第1期

机译：关于差异项目功能与项目难度之间的关系：方法问题？项目响应理论的差分项目功能方法
4. Item response prediction for incomplete response matrix using the EM-type item response theory with application to adaptive online ability evaluation system [C] . Hirose Hideo, Sakumura Takenori Proceedings of IEEE International Conference on Teaching, Assessment, and Learning for Engineering. . 2012

机译：基于EM型项目反应理论的不完全反应矩阵项目反应预测及其在自适应在线能力评估系统中的应用
5. Reasoning with Pseudowords: How Properties of Novel Verbal Stimuli Influence Item Difficulty and Linguistic-Group Score Differences on Cognitive Ability Assessments [D] . Agnello, Paul. 2018

机译：伪论的推理：新颖的口头刺激性质如何影响物品难度和语言群体对认知能力评估的分数差异
6. Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education [O] . Bipin Bhakta, Alan Tennant, Mike Horton, 2005

机译：用项目反应理论探索大学医学教育中扩展匹配题考试的心理计量学特性
7. Using item response theory to explore the psychometric properties of extended matching questions examination in undergraduate medical education [O] . Bhakta B., Tennant A., Horton M., 2005

机译：用项目反应理论探索大学医学教育中扩展匹配题考试的心理计量学特性
8. Cognitive Processing Determinants of Item Difficulty on the Verbal Subtests of the Armed Services Vocational Aptitude Battery. [R] . Mitchell, K. J. 1983

机译：武装职业能力倾向电池口头分测验项目难度的认知加工决定因素。

Exploring the item difficulty and other psychometric properties of the core perceptual, verbal, and working memory subtests of the WAIS-IV using item response theory

摘要

著录项

相似文献

相关主题

期刊订阅