In lieu of an abstract, here is a brief excerpt of the content:

  • The Limits of Big Data
  • Christine Croft (bio)

More Data, More Questions

Chris Poulin’s essay, “Big Data Custodianship in a Global Society,” establishes a foundation for the exploration of fascinating and difficult questions surrounding the use and influence of big data. While Poulin’s discussion ranges from the basics of big data to the implications of its use, three main questions beg greater consideration. First, one must consider in greater depth the “how” of big data: how it is collected, stored, analyzed, and reported. This “how” becomes essential as policymakers and technologists interact to create and inform policy. Second, one must also confront the uncomfortable question of whether big data is actually capable of changing the world. Will big data become the dominant force in helping policymakers predict protests or doctors save lives, or should big data be recognized as one of many metrics to consider? Finally, the question of whether big data will serve to modify behavior based on the knowledge of its collection presents a particularly troubling view of the future.

The Big Data Buzz

In recent years, big data has become a buzz word in the tech-savvy community, simultaneously holding promise and mystery. The “what” of big data—what it can accomplish and the consequences of its use—has been well documented. Yet the “how” of big data is less easily understood, protected by technologists as if it were an industry secret. While the rise of big and open data is purported to offer transparency and greater access, the process [End Page 117] of how to analyze raw data to create actionable insights is still a mystery to many in the business and policy communities.

In his essay, Poulin outlines the process of utilizing big data: collection, storage, processing, query, reporting, and action. Though he appropriately raises the issues of data privacy during the collection, storage, and reporting phases, he skims over the processing and query phases, which are arguably most important to the analysis of big data.1 The absence of a clear, scientific understanding and resulting communication of how massive amounts of data are sorted and analyzed presents a major challenge to policymakers.

David Weinbeger, a senior researcher at Harvard University’s Berkman Center for Internet and Society and co-director of the Harvard Library Innovation Laboratory, argues that “it is not at all clear that human brains will be capable of understanding why the supercomputers have come up with the answers that they have.”2 The risk, he explains, is that we create “knowledge but no understanding.”3 If technologists cannot accurately explain to policymakers and business leaders why the data has produced a certain result, particularly in the case of an outlier, there is a chance the results of the analysis will be discarded as illogical or unsound.

As industry leaders, consumers, and policymakers become increasingly comfortable with the use of big data, we must increase our knowledge about how it is queried, analyzed, and reported. Otherwise, we risk creating statistical results that appear to lack a causal link, which only increases the risk for missing the correlated outcomes that big data is hailed as capable of predicting.

Should Big Data Analyze Everything?

The two case studies (the Arab Spring and the medical industry) discussed by Poulin reflect the diversity of big data’s applications. While some may laud the breadth of topics that big data can touch, one should be equally cautious about the applicability of big data to topics that require more nuanced analysis and understanding than big data alone can deliver. The ease with which big data can be acquired and analyzed inclines us to look for answers where perhaps none exist.

The Arab Spring case is an excellent example of searching for answers where none exist. Ambiguity exists in the world, and big data cannot eliminate it; thus, policymakers must be comfortable operating within the bounds of the known and unknown. However, Poulin offers the Arab Spring as an example of a future in which data from Twitter feeds could be used to predict dissent and anticipate regime change. Such an idea stretches the limits of big data analysis; the events in Egypt cannot be reduced to...

pdf

Share