In lieu of an abstract, here is a brief excerpt of the content:

12 Psychometrics Lacks Power Professor Savin: “What do you want, students?” Iowa Graduate Students: “Power!” Professor Savin: “What do you lack, students?” Iowa Graduate Students: “Power!” eugene savin’s econometrics course, university of iowa, 1993 The cost of the psychological addiction to statistical significance can be measured by the “power function.” Power asks, “What in the proffered experiment is the probability of correctly rejecting the null hypothesis, concluding that the null hypothesis is indeed false when it is false?” If the null hypothesis is false perhaps the other hypothesis—some other effect size—is true. A power function graphs the probability of rejecting the null hypothesis as a function of various assumed-to-be true effect sizes. Obviously the farther the actually true effect size is away from the null the easier it is going to be in an irritatingly random world to reject the null and the higher is going to be the power of the test. Suppose a pill does in fact work to the patient’s benefit. And suppose this efficacy is what the experiment reveals, though with sampling uncertainty . What you want to know—and are able in almost any testing situation to discover—is with how much power you can reject the null of “no efficacy” when the pill (or whatever it is you are studying) is in truth efficacious to such and such a degree. In general, the more power you have the better. You do not want by the vagaries of sampling to be led to reject what is actually a good pill. There are reasons to quibble about this notion of power, as descended intuitively from Gosset and formally from Neyman and Pearson. 131 Sophisticates in the foundations of probability such as Savage and now Edward Leamer at UCLA have complained about its alleged objective certitude. Said Leamer to a 2004 assembly of economists, “[H]ypotheses and models are neither true nor false. They are sometimes useful and sometimes misleading” (Leamer 2004, 556). But this and other sophisticated complaints aside, power is considered by most statisticians—including Gosset and maybe Leamer—to provide a useful protection against unexamined null-hypothesis testing. Power is, so to speak, “powerful” because hypotheses are plural and the plurality of hypotheses entail overlapping probability distributions. In a random sample the sleeping pill Napper may on average induce three extra hours of sleep, plus or minus three. But in another sample the same scientist may find that the same sleeping pill, Napper, induces two extra hours of sleep, plus or minus four (after all, some sleeping pills contain stimulants, causing negative sleep). The traveler would like to know from her doctor before she takes the pill exactly how much confidence she should have in it. “With what probability can I expect to get the additional two or three hours of rest?” she reasonably wants to know. “And with what probability might I actually get less rest?” Without a calculation of power, to be provided by the psychometricians , she can’t say. Calculators of Type I error pretend otherwise: following the practice of R. A. Fisher, they act as if the null hypothesis of “no, zero, nada additional rest” is the only hypothesis that is worthy of probabilistic assessment. They ignore the other hypotheses. They tell the business traveler and other patients: “Pill Napper is statistically significantly different from zero at the 5 percent level.” To which their better judgment—their Gosset judgment—should say, “So What?” Power is, mathematically speaking, a number between zero and 1.0. It is the difference between 1.0 (an extremely high amount of power, a good thing) and the probability of an error of the second kind (a bad thing). The error of the second kind is the error of accepting the null hypothesis of (say) zero effect when the null is in fact false, that is, when (say) such and such a positive effect is true. Typically the power of psychological research is called “high” if it attains a level of .85 or better. (This, too, is arbitrary, of course. A serious study with a loss function may not accept a hard and fast rule.) High power is one element of a good rejection. If the power of a test is low, say, .33, then the scientist will two times in three accept the null and mistakenly conclude that another hypothesis is false. If on the other hand the power of a test is high, say, .85 or higher, then the...

Share