In lieu of an abstract, here is a brief excerpt of the content:

460 ] Trending: The Promises and the Challenges of Big Social Data lev manovich T oday,the term“big data”is often used in popular media,business,computer science,and the computer industry.For instance,in June 2008,Wired magazine opened its special section on“The PetabyteAge”by stating,“Our ability tocapture,warehouse,andunderstandmassiveamountsof dataischangingscience, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions.” In February 2010, The Economist started its special report “Data, Data Everywhere” with the phrase “the industrial revolution of data” (coined by computer scientist Joe Hellerstein) and then went to note that“the effect is being felt everywhere, from business to science , from government to the arts.” Discussions in popular media usually do not define big data in qualitative terms. However, in the computer industry, the term has a more precise meaning: “Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set” (“Big Data”). Since its formation in 2008, the Office of Digital Humanities at the National Endowment for Humanities (NEH) has been systematically creating grant opportunities to help humanists work with large data sets. The following statement from a 2011 grant competition organized by the NEH together with a number of other research agencies in the United States, Canada, the UK, and the Netherlands provides an excellent description of what is at stake: “The idea behind the Digging into Data Challenge is to address how ‘big data’ changes the research landscape for the humanities and social sciences. Now that we have massive databases of materials used by scholars in the humanities and social sciences—ranging from digitized books, newspapers, and music to transactional data like web searches, sensor data or cell phone records—what new, computationally-based research methods might we apply? As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these everyday materials” (“Digging into Data part vi ][ Chapter 27 Trending [ 461 Challenge”). The projects funded by the 2009 Digging into Data Challenge and the earlier NEH 2008 Humanities High Performance Computing Grant Program have begun to map the landscape of data-intensive humanities. They include analysis of the correspondence of European thinkers between 1500 and 1800; maps, texts, and images associated with nineteenth-century railroads in the United States; criminal trial accounts (data size: 127 million words); ancient texts; detailed 3-D maps of ancient Rome; and the research by my lab to develop tools for the analysis and visualization of large image and video data sets. At themomentof thiswriting,thelargestdatasetsbeingusedindigitalhumanities projects are much smaller than big data used by scientists; in fact, if we use industry’s definition, almost none of them qualify as big data (i.e., the work can be done on desktop computers using standard software, as opposed to supercomputers ). But this gap will eventually disappear when humanists start working with born-digital user-generated content (such as billions of photos on Flickr), online user communication (comments about photos), user created metadata (tags), and transaction data (when and from where the photos were uploaded). This web content and data is infinitely larger than all already digitized cultural heritage; and, in contrast to the fixed number of historical artifacts,it grows constantly.(I expect that the number of photos uploaded to Facebook daily is larger than all artifacts stored in all the world’s museums.) In this chapter, I want to address some of the theoretical and practical issues raised by the possibility of using massive amounts of such social and cultural data in the humanities and social sciences.My observations are based on my own experience working since 2007 with large cultural data sets at the Software Studies Initiative (softwarestudies.com) at the University of California, San Diego (UCSD). The issues that I will discuss include the differences between “deep data” about a few people and “surface data” about many people, getting access to transactional data, and the new “data analysis divide” between data experts and researchers without training in computer science. Theemergenceof socialmediainthemiddleof the2000screatedopportunities to study social and cultural processes and dynamics in new ways. For the first time, we can follow imaginations,opinions,ideas,and feelings of...

Share