Comment

David W. Grissmer

doi:10.1353/pep.2002.0004

In lieu of an abstract, here is a brief excerpt of the content:

Brookings Papers on Education Policy 2002 (2002) 269-272

[Access article in PDF]

Comment by David Grissmer

[Volatility in School Test Scores:
Implications for Test-Based Accountability Systems]

The issue addressed by Thomas J. Kane and Douglas O. Staiger is whether schools can reliably be chosen for rewards or sanctions based on year-to-year test score gains. The question is whether picking schools based on gains identifies good or bad schools, or lucky or unlucky schools. The authors' analysis convincingly concludes that methods relying on gain scores at a given grade are mostly identifying lucky and unlucky schools, not good and bad schools. The reason for misidentification is that the variance due to sampling and other sources of noise can be a significant portion of the variance in gains across schools. In this area, standards-based reform is far ahead of statistical reliability.

This paper required several readings to extract the nub of the argument and analysis. I think the exposition can be improved. Basically the focus is on five quantities and their relationship and relative size: between-school score variance in annual scores, between-school variance in score gains from grade to grade, between-school variance in year-to-year score changes at a given grade, sampling variance, and variance from other sources of noise.²⁷ The basic argument is that to reliably identify good and bad schools by any criteria requires that the signal be much greater than the sources of noise. The signal in this case is the portion of a score or score gain that can be attributed to school effort. The noise is caused by sampling variability from a hypothetical student population and other sources of random noise.

The paper estimates these parameters using data from North Carolina and California, and it shows that the sources of noise are too large relative to the [End Page 269] signal to allow reliable identification of good or bad schools. More often than not, the reason a school ends up near the top or bottom of a ranking can be attributed to random factors and not real improvement. Also because the sampling variation decreases with school size, small schools are disproportionately represented at both the top and bottom part of rankings. Perhaps as important, systems that use criteria involving separate consideration of scores by race or ethnicity are likely to make even poorer identification.

I like the way the paper is designed. The authors develop a statistical model, make estimations for parameters in the model, make predictions from the model, and use the data to verify the predictions. The authors draw out the important policy implications and provide guidance on how to improve the identification process.

The parameters are estimated using third- and fourth-grade data from North Carolina. The method used to estimate random sources of noise outside sampling variability is neat. The model and parameter estimations lead to predictions that use of year-to-year gain scores leads to small schools being disproportionately identified as good or bad schools. This prediction is verified in two ways. First, small schools in North Carolina have over twenty times the probability of being identified in the top distribution of rewarded schools. Second, rarely are schools that are rewarded in one year also rewarded in the following years. This nonpersistence of performance implies that nonpersistence sources of error are probably a major component of the actual gains.

The situation gets even worse if rewards or sanctions depend on score gains by racial or ethnic groups within grades, which is an increasingly common practice. The identification is then based on even smaller sample sizes, and the chances of high gains of all racial or ethnic groups become even more dependent on chance. The policy implications cited by the authors include all the morale issues arising from having rewards or sanctions based on factors other than real performance to misidentifying the reasons that schools are improving by focusing on the wrong schools.

The authors analyze possible solutions to the problem of increasing the reliability of the identification process. They analyze pooling scores schoolwide instead of using individual grade scores...

Brookings Papers on Education Policy

Comment

Comment by David Grissmer

[Volatility in School Test Scores:
Implications for Test-Based Accountability Systems]

Share

Additional Information

Project MUSE Mission

Brookings Papers on Education Policy

Comment

Comment by David Grissmer

[Volatility in School Test Scores: Implications for Test-Based Accountability Systems]

Share

Additional Information

[Volatility in School Test Scores:
Implications for Test-Based Accountability Systems]