Brookings-Wharton Papers on Urban Affairs 2003 (2003) 185-220
[Access article in PDF]
Catching Cheating Teachers:
The Results of an Unusual Experiment in Implementing Theory
Brian A. Jacob
John F. Kennedy School of Government, Harvard University
Steven D. Levitt
University of Chicago and American Bar Foundation
EDUCATIONAL REFORM is a critical issue in urban areas. Most large urban school districts in the United States suffer from low test scores, high dropout rates, and frequent teacher turnover. Poor performance of city schools induces flight to the suburbs by affluent families with children, eroding the urban tax base. In response to these concerns, the past decade has seen an increasing emphasis on high-stakes testing. While there is evidence such testing has been associated with impressive gains in test scores in some instances, critics have argued that these gains are artificially induced by "teaching to the test." 1 Indeed, much of the observed test score gain has been shown to be test-specific, not generalizing to other standardized tests that seemingly measure the same skills. 2 Even more ominous is the possibility that the emphasis on high-stakes testing induces cheating on the part of students, teachers, and administrators.
We have developed a method for detecting cheating by teachers and administrators on standardized tests. 3 The basic idea underlying the [End Page 185] method (which is described in greater detail later) is that cheating classrooms will systematically differ from other classrooms along a number of dimensions. For instance, students in cheating classrooms are likely to experience unusually large test score gains in the year of the cheating, followed by unusually small gains or even declines in the following year when the boost attributable to cheating disappears. Just as important as test score fluctuations as an indicator of cheating, however, are telltale patterns of suspicious answer strings—identical blocks of answers for many students in a classroom or cases where students are unable to answer easy questions correctly but do exceptionally well on the most difficult questions. We have concluded that cheating occurs in 3 to 5 percent of elementary school classrooms each year in the Chicago Public Schools (CPS).
Most academic theories, regardless of their inherent merit, fail to influence policy or do so only indirectly and with a long lag. In this paper we report the results of a rare counterexample to this familiar pattern involving collaboration between the CPS and the authors. At the invitation of Arne Duncan, CEO of the Chicago Public Schools, we were granted the opportunity to work with the CPS administration to design and implement auditing and retesting procedures using the tools we developed. With our cheating detection algorithm, we selected roughly 120 classrooms to be retested on the spring 2002 Iowa Test of Basic Skills (ITBS) that was administered to students in the third to eighth grades. The classrooms retested included not only instances suspected of cheating, but also those that had achieved large gains but were not suspected of cheating, as well as a randomly selected control group. As a consequence, the implementation also allowed a prospective test of the validity of the tools we developed.
The results of the retesting provided strong support for the effectiveness of the cheating-detection algorithm. Classrooms suspected of cheating experienced large declines in test scores when retested under controlled conditions. Classrooms not suspected previously of cheating maintained almost all of their gains on the retest. The results of the retests were used to launch investigations of twenty-nine classrooms. While these investigations have not yet been completed, it is expected that disciplinary action will be brought against a substantial number of teachers, test administrators, and principals. [End Page 186]
Finally, the data generated by the auditing experiment provided a unique opportunity for evaluating and improving the techniques for detecting cheating. The cheating algorithm was developed without access to multiple observations for the same classrooms. By observing two sets of results from the same classrooms (one from the original test and a second from the retest), we...