- Who’s Trending in 1910s American Cinema?Exploring ECHO and MHDL at Scale with Arclight
Over the last decade, digital humanities scholars have challenged a diverse set of academic disciplines to reevaluate traditional canons and histories through the use of computational methods. Whether we choose to call such approaches “distant reading” (as Franco Moretti does), “macroanalysis” (as per Matthew Jockers), or simply “text mining,” the basic intervention of these digital methods remains the same: presenting opportunities to harness the power of scale, shifting our attention away from a small number of canonical texts and testing long-held historical assumptions.1 What might such a “distant reading” approach look like for the history of American cinema in the 1910s? Given that so many of the most revealing original texts of that history—the films themselves—cannot be read at all, how might we perform a distant reading of early cinema texts that we do have in abundance, such as industry trade papers and fan magazines?2
In this article, we propose and test two interrelated methods of distant reading for the history of silent cinema: Scaled Entity Search (SES) and query comparison. We developed these approaches as part of Project Arclight, an initiative to create a web-based application allowing users to analyze millions of pages of digitally scanned film industry trade journals and magazines.3 Our initial inspiration came from a twenty-first-century platform: Twitter. If researchers can use algorithms to track celebrities “trending” on Twitter—becoming suddenly high profile and gaining an explosive increase in mentions on the service—could we imagine the 2 million digitized pages of the Media History Digital Library (MHDL) as a giant social media stream, itself rich with trending stars, directors, and other personnel? Given a large enough list of such personnel—all of the twenty thousand credited, named entities from more than thirty-five thousand filmographic records in the Early Cinema History Online (ECHO) data set, for example—we realized that it was absolutely possible to measure the “top trending” names of the film industry trade press in the 1910s. Even beyond trends, we realized that this work allowed us to analyze relationships—in particular, the relationships between filmworkers and the trade press and between workers’ representations in the press and their prolificacy (measured by number of films worked on). Our subsequent analysis revealed the outsized influence of certain entities and the curious underinfluence of others.
The article is divided into three parts. First, we examine ECHO as a data set, attending to its strengths and limitations for historical research. We also describe the method by which it was processed and transformed to generate lists of unique historical entities (e.g., cinematographers’ names) to be searched at scale in the MHDL. In the second section, we present the results of our SES of those entities using Arclight and offer case studies that highlight both the affordances and the drawbacks of SES as a historiographical method to explore the trending of, and relationships between, [End Page 58] entities. Finally, we discuss the revelations of query comparison between ECHO and MHDL, showing how the interplay between these two large-scale corpora allows us to expand the canon of what Charlie Keil and Shelley Stamp have called the “transitional era” of film history.4 By shedding light on forgotten personnel and recontextualizing well-known stars, quantitative and computational approaches such as those offered by SES and the Arclight app offer a significant avenue for scaled research on the history of a period whose study has long been faced with the challenge of access to primary materials. Ultimately, we hope these approaches offer a roadmap for future research using these rich data sets.
Our work has been possible thanks to the formal support of academic institutions and funding agencies as well as the informal support of film libraries, archives, and collecting communities. Arclight’s development and our research were enabled by a Digging into Data grant,5 sponsored by the US Institute for Museum and Library Services6 and Canada’s Social Sciences and Humanities Research Council.7 Our research...