- Introduction: Searching Engines, Reading Machines
On 15 September 1869, the London Daily News raised an alarm concerning the growth of the library collections at the British Museum, and expressed a curiously prescient longing for “machinery” that could help us read the “illimitable mass” of texts (5). In that same city, the aged Charles Babbage was still tinkering toward a prototype for his punched-card Analytical Engine, but the writing was on the wall—indeed, it was overflowing the shelves:
Must we not pity the historians of the future if they should at any time be so conscientious as to turn over the mountains of waste paper which are now being shot by cartloads into the Museum? Human eyes and human hands cannot possibly work through a century of such agglomeration. The human mind will despair, perhaps, of power to deal with the illimitable mass. May we hope that when things come to such a crisis, human labor of the literary sort may be in part superseded by machinery? Machinery has done wonders, and when we think of what literature is becoming, it is certainly to be wished that we could read it by machinery, and by machinery digest it.(5)
The idea that “human labor of the literary sort may be in part superseded by machinery” provokes a wide range of feeling now, from eager enthusiasm to downright dread, as more and more of our textual encounters are mediated by digital technologies. Virtually every day, in examining the literary products of that “century of . . . agglomeration” known as the nineteenth, we have occasion to “read it by machinery.” This basic change to our discipline, and to the library at its heart, has thus far been incompletely theorized; its practical consequences [End Page 63] are still emerging. We all know that digital cataloging, representation, storage, and searching of Victorian texts as well as their attendant scholarship are altering the vernacular and methodologies of the tribe. It remains for us to take an active, experimental, and critical role in this ongoing information revolution.
The essays presented here do precisely that, focusing on the kinds of evidence that emerge from algorithmic searching of large bodies of digitized text and on the modes of interpretation such evidence evokes. The Daily News article makes clear that, even as it was being produced, the nineteenth-century printed record overwhelmed readers with its scope. Ever since, we have of necessity relied on partial reading, specialized attention, and representative sampling for our interpretations of the cultures that produced it. Yet now, with the advent of multimillion-text repositories such as the Google Books corpus and the HathiTrust digital library, or more specifically (and for a fee) ProQuest’s C19 database of 23 million nineteenth-century items and Gale/Cengage’s ambitious new Nineteenth Century Collections Online, we are in a position to revise exponentially upward the number of texts that we bring to bear on our hypotheses. Each of these forum essays suggests that complex digital search techniques (sometimes called “text mining”) can guide us toward new patterns and connections that are only visible through the power of digital processing: reading by machinery. Yet each also recognizes some of the challenges and even pitfalls associated with such work, thereby guiding us to more robust practice.
Ryan Heuser and Long Le-Khac’s essay, “Learning to Read Data,” offers the largest claims for the empirical nature of text mining, as conducted with their special corpus of almost 3,000 digitized nineteenth-century British novels. Working in the Stanford Literary Lab directed by Matthew Jockers and Franco Moretti, Heuser and Le-Khac present their work as fully experimental in nature, wherein data is taken and visualized and only then analyzed with an eye toward interpretation. Emergent semantic fields in the text corpus suggest an overall decline in “abstract value” terms and an overall rise in words that are “concrete, physical, specific, and non-evaluative” (83). In the essay’s terms, this data (visualized in the accompanying graphs) is the signal. But what concepts can be drawn from it? Heuser and Le-Khac note our tendency to “read data in terms of concepts we already have at hand,” thus...