In lieu of an abstract, here is a brief excerpt of the content:

3 An Overview of Current Sabermetric Thought I Offense In the next two chapters we will present an overview of the current state of baseball analytics, while making careful attempts to compare the current results to those that were mentioned in Moneyball. Our emphasis is on exposition, in that we will attempt to explain and justify the basics of sabermetric theory to the reader. Although much lies beyond the scope of what we can accomplish here, a thorough reading should give the interested reader a firm grasp of how sabermetricians think about the game, and demystify some important results that are mentioned in passing in both Moneyball and the popular media. Why Do Teams Win Games? For those new to sabermetrics, one of the most eye-opening passages in Moneyball, the book, begins with Paul DePodesta “reducing the coming six months to a math problem.”1 To accomplish this, DePodesta estimates four quantities: 1. the number of wins likely necessary to make the playoffs (about 95); 2. the number of runs by which the A’s need to outscore their opponents over the course of the season in order to win that many games (about 135); Overview of Sabermetric Thought I 39 3. the number of runs that the A’s, as currently constituted, are likely to score (810 ± 10); and, 4. the number of runs that the A’s, as currently constituted, are likely to allow (660 ± 10). The answers to the last two questions allow DePodesta to determine whether the A’s will reach the threshold in the first question. Lewis mentions parenthetically the missing piece of the equation: a strong relationship between the number of runs that a team scores and allows over the course of the season, and the number of games that they win. This may seem obvious, but keep in mind that we are only talking about the cumulative number of runs scored and allowed over the course of a season, with no information about the distribution of how those runs are scored in any particular game. The relationship to which Lewis alludes is known, somewhat misleadingly , as the Pythagorean Expectation, and it is one of Bill James’s more enduring contributions to the field of sabermetrics. James created a simple but nonlinear statistical model that relates runs scored (RS) and runs allowed (RA) to a team’s expected winning percentage (WPCT): RS2 1 WPCT = ——————= —————— RS2 + RA2 1 + (RA ∕RS)2 James described his formula as Pythagorean because the sum of squared terms reminded him of the Pythagorean Theorem (a2 + b2 = c2 , where a and b are the lengths of the shorter sides of a right triangle, and c is the length of the hypotenuse). But this similarity was largely a coincidence. While James undoubtedly used the exponent of 2 (the solid line in Figure 1) for convenience and simplicity, later sabermetricians sought a more precise, less arbitrary value, and found that as the game has changed over the years, the value of the exponent that best fits the data has changed with it. For clarity, we show (dotted line) that the exponent that best fits the data from all team-seasons since 1954 is about 1.85.2 [18.117.196.217] Project MUSE (2024-04-23 17:10 GMT) Overview of Sabermetric Thought I 41 Naturally, deviations from this expected winning percentage are the subject of some debate. The standard deviation between the expected and actual wins is about four games, and it is rare for teams to underperform or overperform their expected winning percentage by more than ten games. When that happens, is it pure luck? Is it the team’s performance in one-run games? Is it the presence of a spectacular bullpen or closer? Is it clutch hitting? Theories abound, but compelling explanations are elusive. The notion of expected winning percentage has caught on in other sports, each having a different exponent. In basketball, the exponent is much higher (somewhere between 14 and 17), while in football, it is about 2.4.3 Nevertheless , an analytic explanation of why James’s model was so successful eluded researchers until 2005, when Steven Miller proved that James’s model, with an unknown exponent, could be derived by assuming that a team’s runs scored and runs allowed were independent, and each followed a well-known statistical distribution.4 The fact that James’s expected winning percentage hewed so closely to a team’s actual winning percentage over the course of the...

Share