- Big Folklore:A Special Issue on Computational Folkloristics
afs ethnographic thesaurus, Technology, computers, web portals, data
Any fact becomes important when it’s connected to another.—Umberto Eco (1989:385)
over the course of the past decade, a revolution has occurred in the materials available for the study of folklore. The size and scope of digital archives of folklore have exploded, and the magnitude of digital materials available for folkloristic consideration has increased exponentially. Around the world, national archives have made considerable efforts to make their resources machine-readable, while other initiatives have focused on the digitization of resources related to smaller regions, single collectors, or single genres. Simultaneously, the explosive growth in social media, weblogs (blogs), and other Internet resources have made previously hard-to-access records of traditional expressive culture accessible at a scale so enormous that it is hard to fathom. These developments, coupled with the development of algorithmic approaches to the analysis of large, unstructured datasets and new methods for the visualization and navigation of the relationships discovered by these algorithmic approaches—from mapping to 3-D embedding, from time lines to navigable visualizations—offer folklorists new opportunities for the analysis of traditional expression. Folklore studies that leverage the power of these algorithmic approaches fall under the rubric of “computational folkloristics” (Abello, Broadwell, and Tangherlini 2012).
Certain challenges attach to work in the digital realm, particularly if one believes that folklore emerges from the productive dialectic that exists between individuals and tradition. From this perspective, the study of folklore is predicated on the retrospective analysis of recordings of culturally expressive forms and their performance or articulation. In earlier work, I outlined four main areas that will, in the coming years, require significant attention as the materials of study are increasingly represented as digital objects, whether the materials be oral performances or aspects of [End Page 5] material culture, or any of the other numerous types of expression that folklorists work with (Tangherlini 2013c). Broadly speaking, these four areas are (1) collecting and archiving, (2) indexing and classifying, (3) visualization and navigation, and (4) analysis (Tangherlini 2013c:8).
Collecting and Archiving
Prior to the advent of the digital age, fieldwork was an individual or small group endeavor that frequently generated idiosyncratic but simple data that could be easily incorporated into the analog archives of either the researcher or larger institutions. By way of contrast, contemporary digitally based fieldwork often generates huge amounts of data in formats that are not always stable or easily accessed by future researchers.1 Inevitably, efforts to transform archival materials from mostly analog formats that inadvertently regulate access (one must travel to use them) to digital formats that can theoretically be made freely available throughout the world, raise complicated intellectual property issues. Although these issues lie outside the realm of the computational, they necessarily affect how folklorists work.2
The World Wide Web (www) has also led to the development of new collecting methodologies. Whereas most folklore fieldwork in the past was carried out either through face-to-face interaction or surveys, fieldwork can now be carried out on and among (as opposed to with) groups and individuals who are not necessarily aware that they are participating in an ethnographic project. Predicated on the conceptualization of the Web as a dynamic, self-organizing folklore archive, these efforts rely on several algorithmic approaches to data collection, from simple scraping to Web crawling.3 One can, for example, fairly easily create a “spider” that crawls the net, storing not only content related to the parameters of the crawl, but also metadata about the sites visited and the users of those sites.4 Consequently, these collection methods precipitate profound ethical concerns related to privacy and surveillance.5 Yet crawled data have obvious benefits, allowing researchers to consider research questions at a much greater scale than previously possible.
Archiving these increasingly large collections of heterogeneous digital data poses another considerable challenge. Although not a computational problem per se, the design of data structures influences the types of computational methods that can be deployed for the purposes of data navigation, retrieval, and analysis. Ultimately, the methods of collection and the methods of storage...