In lieu of an abstract, here is a brief excerpt of the content:

  • Branch Rickey’s Equation Fifty Years Later
  • Ray C. Fair (bio) and Danielle Catambay (bio)

Introduction

In a 1954 Life magazine article, Branch Rickey introduced an equation relating a baseball team’s performance in a season to various measures of offense and defense. One of his findings was that on-base percentage dominates batting average in the measure of offense, which, as Schwarz notes, was way ahead of its time.1 Rickey’s analysis is quite interesting. It is probably largely due to Allan Roth, whom he mentions in the article. Rickey and Roth were not mathematical statisticians, and they took their figures to “mathematicians at a famous research institute” (alas, Princeton, not Yale).2 They got their results back in six weeks, “which constituted a framework around which to build a formula.”3 Rickey does not discuss in a mathematically rigorous way the derivation of his formula, but there is enough discussion of technique in the article to see roughly what he did.

In this paper, Rickey’s equation is examined using a more formal statistical technique, regression analysis, which is often used in the social sciences. The equation is first examined using data from Rickey’s own period, 1934 to 1953, and then it is extended to the present to see how it does with data from the modern era. It will be seen that the results from 1934 to 1953 support Rickey’s conclusions and that the equation holds up well when extended fifty-one years through 2004. Although Rickey’s equation was largely ignored at the time, the results in this paper suggest that perhaps it should not have been.

The Equation

Rickey said he used the last twenty years worth of data to build his formula; we will assume that 1934 to 1953 were the twenty years in question. The data are yearly and by team.4 In this period, there were sixteen teams, eight per [End Page 111] league, so the number of observations we can use is 320. Rickey used as the measure of team performance the number of games behind the league leader for the season, denoted G. He was also interested in a team’s average runs per game in a season relative to the average runs per game of the team’s opponents. Rickey first noted that this variable and G have a strong positive correlation. This is not surprising. The more runs a team scores relative to its opponents, the more games it is likely to win. Rickey’s aim was then to see if he could find measures of offense that were highly correlated with a team’s average runs per game and measures of defense that were highly correlated with the average runs per game of the team’s opponents. Such measures would then be highly correlated with G and would give one an idea of the kinds of offense and defense that are most effective. In the end (after getting back the results from the mathematical experts), he came up with three measures of offense and four measures of defense.

The first measure of offense is on-base percentage

where H is hits, BB is bases on balls, HP is hit by pitch, and AB is at bats.5 These variables are all a team’s totals for the season. The second equation is a measure of extra base power

where TB is total bases (calculated by multiplying the number of home runs by 4, the number of triples by 3, the number of doubles by 2, and adding those figures to the number of singles). Rickey said that (TB-H)/AB had a lower correlation with a team’s average runs per game than did the other two measures (equations 1 and 3), and he adjusted for this by multiplying it by 0.75, which is something we will return to later on. The third measure is what Rickey calls “clutch”

where R is runs scored. This variable is the percent of players on base who score.6 The total offense measure is then the sum of these first three formulas:

Rickey used four measures to calculate defense. These are measures that are meant to...

pdf