A comparison of constructed-response testing and multiple-choice testing reveals that the former tends to have high validity but poor reliability while the latter yields good reliability but has dubious validity and negative backwash effects on teaching. The present research attempts to design short-answer questions (SAQs) for reading comprehension that can yield high reliability, validity, fairness, and positive backwash effects on teaching.;The research was conducted in two stages: (1) developing SAQs for two passages, and (2) evaluating these questions. In the initial stage, a model of reading comprehension (MRC) was developed based on think-aloud protocols by target readers (i.e., College English students) as well as control readers (i.e., College English teachers and native English speakers) and a macrostructural analysis was made for each passage. This model was then used to develop SAQs focused on both language and content that were not only appropriate for the target students, but covered the major content of the passages. In addition, scoring rubrics were developed based largely on core words (though, in some cases, supplemented by core concepts). After a small-scale pilot testing, the SAQS and the rubrics were revised.;In the second stage of the research, the test material was administered to 380 students. Student responses to the SAQs were evaluated with respect to level of difficulty, the degree of language stability in correct responses, and the degree to which these responses can be differentiated from incorrect ones. In addition, a sample of these students along with College English teachers responded to a questionnaire about the SAQs.;The results show that there was a reasonably balanced spread of SAQs at easy, intermediate, and difficult levels, with a slightly greater number at the intermediate level (content SAQs, as anticipated, were somewhat more difficult than language SAQs). For all but four of the 26 questions, the percentage of correct responses containing core words was above 90, and for all but two the percentage of incorrect responses containing core words was below 5. Hence the correct responses and incorrect responses are strongly differentiated and this differentiation provides an effective base for developing a computer-assisted scoring system.
展开▼