In lieu of an abstract, here is a brief excerpt of the content:

24 What to Do Not where it comes from but what it leads to is to decide. william james 1896, 98 Education Our first suggestion is mild and educational. Scientists in the sizeless sciences need to start telling each other to seek substance. They need to stop believing that the translation of a problem into probability space relieves them of the need to consider oomph and loss functions. The ritualism of significance testing needs to be challenged at the level of paradigm at every seminar, in every referee report, in every classroom.1 The physicist and econometrician Joel Horowitz claims to us that such challenges have become common in economics, partly because of our earlier complaints. We think Horowitz is mistaken. The Fisherian ritual goes on and on. Follow then Horowitz’s own practice: What’s your oomph? How do you know? Listen to Gosset explaining in his last year of life his thoughts on “significance ” to Egon Pearson, who was then the chief editor of Biometrika. [O]bviously the important thing in such is to have a low real error, not to have a “significant” result at a particular station. The latter seems to me to be nearly valueless in itself. . . . Experiments at a single station [i.e., tests of statistical significance on a single set of data] are almost valueless . . . . What you really want is a low real error. You want to be able to say not only “We have significant evidence that if farmers in general do this they will make money by it,” but also “we have found it so in nineteen cases out of twenty and we are finding out why it doesn’t work in the twentieth.” To do that you have to be as sure as possible which is the 20th—your real error must be small. (Gosset to E. S. Pearson, 1937, in E. Pearson 1939, 244) 245 Gosset-speak is what we need. Undergraduates need to hear from the beginning that size matters—measured in units of money or justice or life or persuasiveness. They need to acquire the virtues necessary for performing repeated experiments on the same material. They need to hear that random error is one out of many dozens of errors and seldom the biggest. They need to learn that “the real error must be small.” After all, reconciling differences of effect, finding the common ground, is the point of statistics. Professors should show students why they need to attend to substance, as does for example the epidemiologist Kenneth Rothman. The point about the insignificance of significance should not be shunted off to one obscure paragraph mentioning that large samples yield “significance” everywhere. As in the Freedman, Pisani, and Purvis text, or in the old text by Wallis and Roberts (1956), and in the careers of Gosset and W. Edwards Deming, the size-matters/how-much should be the substance of most of the paragraphs. Controlling for the second kind of sampling error is necessary and important . But it is not most important. Most important is to minimize Error of the Third Kind, “the error of undue inattention,” which is caused by trying to solve a scientific problem using statistical significance or insignificance only. In science, as against careerism or pure mathematics, it is better to be approximately correct and scientifically relevant than it is to be precisely correct but humanly irrelevant. Not even the fully specified power function, balancing the risk of errors from random sampling, provides a full solution to a scientific problem. In truth, as Kruskal never tired of remarking, statistical “significance” poses no scientific problem at all. With the aid of a personal computer and a grant such significance is easy to achieve. Graduate students today are oversupplied with analytic proofs of asymptotic results. They are not being taught how to control for the third kind of error. Formalities are privileged in textbooks over substantive thinking about what a test can yield. We have met too many well trained young economists who say to us, “I didn’t know I was supposed to look at the column of coefficients. I just look at the p-values.” And “I thought Fisher’s test told me about the likelihood of the hypothesis. Doesn’t it?” No. If Gosset could teach these points to the elder Beaven, an experimental farmer of barley wisely suspicious of the logic of Latin squares and the importance of degrees of freedom in tests of statistical significance— Beaven protested wittily to Gosset...

Share