Brookings Papers on Education Policy 2005.1 (2005) 67-80
[Access article in PDF]
[Article by Brian Jacob and Jens Ludwig]
Comment by Robert Boruch
Brian Jacob and Jens Ludwig have developed a fine paper. I have no major disagreements with what they have said. I do, however, have suggestions that amplify their paper's virtues and may help to reduce its vulnerabilities.
Randomized Controlled Trials versus Quasi Experiments
Jacob and Ludwig declare that it is important, for scientific reasons, to mount comparative empirical studies to explore whether and when the results of randomized controlled trials (RCTs) differ from the results of nonrandomized trials, also called quasi-experimental designs (QEDs). They argue further that such comparative studies will drive up the demand for higher-quality education research, especially randomized trials. Let me suggest ways to build on their declarations.
I agree with Jacob and Ludwig that the 2003 work of Steven Glazerman, Dan Levy, and David Myers is important.65 But Glazerman and his colleagues do not consider the directions of biases. They examine only absolute values in the bias. They do not consider scenarios in which bias direction ought to be taken into account and in which the direction of bias may be domain specific. That programs can be made to look harmful when they are merely useless is important to some policy people and certainly to some scientists. That some programs can be made to look as if their effects are positive when their actual effects are negligible is also important to some other policy people and to some scientists.
First, we need to recognize that the direction of statistical biases in QEDs can be important and is likely to be domain specific. For example, D. T. Campbell and R. F. Boruch give plausible scenarios of the different ways in which compensatory education programs can be made to look harmful when, in fact, they are merely useless.66 An obvious and possible reason for a negative bias in ordinary least squares regression estimates based on observational survey data on some interventions is that adjustment variables (covariates, the right-hand side of the equation, and the like) are measured imperfectly. This, since the 1960s, arguably has resulted in biased estimates of the effects of Head Start, the Comprehensive Employment and Training Act, Youth Employment and Demonstration Projects Act, and other programs when analyses were based on linear models applied to data from nonrandomized trials. A second basic condition is omitted variables, of course. [End Page 67]
Think now about the uncertainty and magnitude of bias and how this might be domain specific. For instance, Larry Hedges and Amy Nowell, basing their work on within-country studies, and Aubrey Wang, basing hers on cross-country studies in the Third International Mathematics and Science Study (TIMSS), find that boys' performance on achievement tests is more variable than girls'.67 These studies are based on probability sample surveys.
We do not know why this holds for math across most of thirty countries in the TIMSS. But this domain specificity, gender being the domain, suggests that our ability to forecast for boys is inferior to that for girls, at least at times. The variability in bias, when an estimator is, in fact, biased, may then also be larger when the target is boys as opposed to girls.
Thomas Fraker and Rebecca Maynard's comparative study of QEDs versus RCTs is among the few that attends to domain differences.68 Their work invites an exploration of the prospect that the bias in nonrandomized trials may be lower when the target is mainly women (in the Fraker and Maynard study, those who were receiving Aid to Families with Dependent Children) as opposed to mainly young males (in the same study, youth who were involved in Supported Work programs). The complications are obvious. Nonetheless, one implication is that, if we are interested in whether estimators are biased and, if so, by how much, domains may count.
Coupling Randomized Controlled Trials and Quasi Experiments
Jacob and Ludwig further suggest that randomized trials might routinely be yoked with nonrandomized studies (quasi experiments) so...