Computerized adaptive testing (CAT) is a powerful and efficient approach in educational testing for both estimating ability and classifying examinees into groups. When the purpose is to classify students as either proficient or not proficient in ability, an accurate estimate is not necessary, and the test can stop whenever a satisfactory decision can be made. Therefore, the stopping rule is a critical element in variable-length adaptive testing. In this study, the efficiency of four stopping rules was compared in variable-length CAT designs (vl-CAT): ability confidence interval (ACI), sequential probability ratio test (SPRT), generalized likelihood ratio (GLR), and the truncation rule. In addition, their application to the newly-developed adaptive testing design, on-the-fly multistage testing (OMST), was also examined and compared with vl-CAT. Two simulation studies were conducted. In study 1, since the fourth stopping rule cannot be executed independently, ACI, SPRT, and GLR were combined with the truncation rule, which resulted in 6 CAT and 6 OMST designs in total. With the classification accuracy (CA) controlled at the same level, the test length of 12 variable-length designs was examined. All test designs in study 1 have a length between 10 and 30. In study 2, the lower and upper bound of the test length was extended to 30 and 100, and only ACI- and GLR- CAT designs were conducted to provide a more general comparison of these two stopping rules. In both studies, the ability was estimated by maximum likelihood estimation or expected a posterior. The next item(s) is (are) selected with the maximum priority index at the current ability estimate. 1000 theta values were simulated from a standard normal distribution, and 30 replications were conducted under each design. The results show that OMST produced similar results to CAT. Regarding the efficiency of four stopping rules, the truncated versions of ACI, SPRT, and GLR produced shorter test lengths than their corresponding counterparts. Among ACI, SPRT, and GLR, SPRT yielded the longest test length with the highest estimation accuracy. The results of GLR and ACI designs are similar, but ACI is more efficient for examinees whose ability is far from the cutoff point, and GLR is more efficient for examinees whose ability is near the cutoff point. It can be concluded that the stopping rules designed for CAT also function for OMST in a similar way. When the item selection method is estimate-based rather than cutscore-based, SPRT performed less efficiently than ACI and GLR. The efficiency of GLR and ACI is comparable, and each has its own strengths. The truncation rule is useful because it prevents examinees from taking unnecessary items. These results imply the good statistical properties of variable-length OMST, which facilitates its future application. The research also provides a direct comparison between different stopping rules, giving practitioners more information about the above-mentioned applicable situations of different rules. The studies also develop a new simple truncation rule and indicate its feasibility in a future adaptive testing context.
【 预 览 】
附件列表
Files
Size
Format
View
Comparison of four stopping rules in computerized adaptive testing and examination of their application to on-the-fly multistage testing