学位论文详细信息
Prompt and Rater Effects in Second Language Writing Performance Assessment.
Language Testing;Writing Assessment;Performance Assessment;Educational Measurement;Education;Social Sciences;Education
Lim, Gad S.Johnson, Jeffrey S. ;
University of Michigan
关键词: Language Testing;    Writing Assessment;    Performance Assessment;    Educational Measurement;    Education;    Social Sciences;    Education;   
Others  :  https://deepblue.lib.umich.edu/bitstream/handle/2027.42/64665/limgs_1.pdf?sequence=1&isAllowed=y
瑞士|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF
【 摘 要 】

Performance assessments have become the norm for evaluating language learners’ writing abilities in international examinations of English proficiency.Two aspects of these assessments are usually systematically varied: test takers respond to different prompts, and their responses are read by different raters.This raises the possibility of undue prompt and rater effects on test-takers’ scores, which can affect the validity, reliability, and fairness of these tests.This study uses data from the Michigan English Language Assessment Battery (MELAB), including all official ratings given over a period of over four years (n=29,831), to examine these issues related to scoring validity.It uses the multi-facet extension of Rasch methodology to model this data, producing measures on a common, interval scale.First, the study investigates the comparability of prompts that differ on topic domain, rhetorical task, prompt length, task constraint, expected grammatical person of response, and number of tasks.It also considers whether prompts are differentially difficult for test takers of different genders, language backgrounds, and proficiency levels.Second, the study investigates the quality of raters’ ratings, whether these are affected by time and by raters’ experience and language background.It also considers whether raters alter their rating behavior depending on their perceptions of prompt difficulty and of test-takers’ prompt selection behavior.The results show that test-takers’ scores reflect actual ability in the construct being measured as operationalized in the rating scale, and are generally not affected by a range of prompt dimensions, rater variables, or test taker characteristics.It can be concluded that scores on this test and others whose particulars are like it have score validity, and assuming that other inferences in the validity argument are similarly warranted, can be used as a basis for making appropriate decisions.Further studies to develop a framework of task difficulty and a model of rater development are proposed.

【 预 览 】
附件列表
Files Size Format View
Prompt and Rater Effects in Second Language Writing Performance Assessment. 1488KB PDF download
  文献评价指标  
  下载次数:24次 浏览次数:78次