- Presidential Address:Macroeconomics and Online Prices
The availability of microeconomic pricing data has produced an empirical revolution in macroeconomics. Two events were the main culprits: First, national statistical offices allowed economists to study the data underlying the construction of the consumer price index (CPI), which gave our profession the chance to tackle many questions that have been open for decades. Second, scanner data from several supermarkets and merchandising companies were made available as well, offering another great opportunity for research. The earlier literature used aggregate price indexes to address questions of price rigidity, the law of one price, cost and exchange rate pass-through, international market segmentation, and so on. The aggregation and the procedures behind the construction of those indexes mask several economic phenomena. The availability of more detailed data has allowed the profession to take a closer look at old questions.1
The microeconomic CPI data has several advantages. The first and most important is its representativeness. Statistical offices invest in the design of the data to make sure they include a representative set of prices for the consumption basket. The second advantage is the long history. When the micro-economic price data are released, researchers generally have access to several years and sometimes decades. This feature is quite important for evaluating pass-through and relative price equilibrium deviations. The disadvantages are many, including the one indicated by Alberto Cavallo in his thesis: micro-economic CPI prices are plagued with unit values.2 Although unit values are [End Page 199] conceivably a very good piece of information for the computation of inflation, they are terrible for understanding price-setting dynamics, especially when evaluating price stickiness. The second disadvantage is that even though the data are representative, CPIs tend to have very few items in each sector, which means that the heterogeneity within sectors is disguised by small samples.
The scanner data resolved some of the issues in the CPI data and worsened others. The scope was clearly much bigger, and the extensive product variety allowed researchers to deal with heterogeneity much better. In addition, several scanner data sets have information on cost and quantities, as well as on the prices, which facilitates addressing questions on pass-through. The disadvantages, however, are several. First, scanner data are not representative. The data characteristics vary greatly depending on the data provider, the location, and the time period when the data were collected. Databases available for research are usually from a single retailer, which makes generalization even harder. Furthermore, the quantity data captured by scanner data can be biased. The reason is simple: a supermarket that sells milk at an unusually low price will experience unusually large sales of milk. Using the quantities to determine aggregate implications produces a massive bias. Second, scanner data also has unit values, with prices reported as a ratio of sales over quantities and averaged over a week. Not all scanner data suffer from this problem, but most of the data sets that have been used in the macroeconomic and international literature do. Observing the average price is not necessarily wrong if the question asked is about pass-through or inflation. It is, however, the incorrect data point when addressing questions of price stickiness and price dynamics.3
Alberto Cavallo and I started the Billion Prices Project at MIT (BPP) almost a decade ago to explore how using web methods can improve the collection of prices.4 We use web scraping to download millions of prices every day, from hundreds of retailers in more than seventy countries. The purpose is to collect all the products sold by a store, identify the posted prices every day, and also detect information about sales, promotions, and the like, in which case we collect all prices available. There are some advantages and disadvantages to these data. First, they are not as representative as census [End Page 200] data, but they are more representative than data from a single store, which is the typical source of scanner data. In particular, every U.S. supermarket selling products through an online web page is probably in our data set, not just Safeway. Second, the data are daily and have no...