Nonpartisan Education Review | |
Irrelevance of Reliability Coefficients to Accountability Systems: Statistical Disconnect in Kane-Staiger | |
关键词: education; policy; Kane; Staiger; Rogosa; API; STAR; California state testing; test score volatility; reliability coefficients; accountability; assessment; education; school; AYP; accountability index; | |
DOI : | |
来源: DOAJ |
【 摘 要 】
The body of this report consists of a fairly thorough effort to discredit the empirical assertions and methodological prescriptions of Kane and Staiger (KS). The four main sections of content that follow this (lengthy) Preamble are:Section 1 Accuracy Of Group SummariesExact results are obtained for the accuracy of grade-level scores (forn=68) which are then compared with the reliability-style calculationsreported in KS for North Carolina data. Also, accuracy properties ofCalifornia API school-level scores are presented, and to compare with KS assertions, the reliability coefficients for these scores are calculated. KS find high volatility even when accuracy is very good, and KS find extreme absence of volatility even when accuracy is moderate to poor.Section 2 Accuracy of ImprovementPrecision of improvement is contrasted with KS-style reliability ofimprovement. Analytic and empirical examples for accuracy of improvement reinforce the basic message: reliability is not precision. Most importantly, precision, which is what matters, can be low, and reliability still be high. And vice versa. Also, school-level California API data display no relation between amount of improvement and uncertainty in the scores (Figures 2.1-2.3), refuting a key KS assertion about school size.Section 3 Persistence of Change.The KS correlation of consecutive changes--and thus the KS estimate of"proportion of variance in changes due to nonpersistent factors"--isshown to be a function of the reliability of the difference score. KSdeterminations of persistence of change are shown to be without valuein accountability systems. Common-sense definitions of consistency ofimprovement and empirical demonstrations using artificial data arepresented.Section 4 California Academic Performance Index Award ProgramsDiscussion of appropriate methods for describing the properties of Award Programs (e.g., determinations of false positive and false negatives) are contrasted with the incorrect empirical assertions and methodologies in KS. Counterexamples to each of the KS "Lessons" are presented in detail. The focus is on the effect of school size, to link with the accuracy results of previous sections.
【 授权许可】
Unknown