- Learning to Read Data: Bringing out the Humanistic in the Digital Humanities
As humanists, we believe we are trained as expert readers, able to read almost any kind of text closely, deeply, and critically. But on 21 February 2010, the two of us sat staring at a computer screen dumbfounded by a kind of text that for once we had no idea how to read. Working with a corpus of thousands of digitized nineteenth-century British novels, we had just produced a plot, similar to figure 1, showing the usage trends of a massive group of abstract words relating to social values. The plot seemed to show all these words disappearing over the nineteenth century. What exactly were we seeing here? A heated discussion of principles in the fallout of the French Revolution? A death of values in the Victorian period?
The moment was emblematic of how it can feel to encounter digital humanities work. In facing a radically new kind of text, a different kind of evidence, tremendous excitement and real anxiety mix. It’s easy to understand the excitement. These emerging methods promise ways to pursue big questions we’ve always wanted to ask with evidence not from a selection of texts, but from something approaching the entire literary or cultural record. Moreover, the answers produced could have the authoritative backing of empirical data. But it’s also understandable how these possibilities could be unsettling. By offering an entirely different model of humanities scholarship, the digital humanities raise many questions. What do we do with this kind of evidence? Can we leverage quantitative methods in ways that respect the nuance and complexity we value in the humanities? Behind these questions is perhaps a deeper concern. Under the flag of interdisciplinarity, are the digital humanities no more than the colonization of the humanities by the sciences?
While doing digital humanities research over the past two years, we have constantly wrestled with these questions. We have learned, [End Page 79] happily, that the answers may not be as troubling as they can seem. We are convinced that, when done well, such research can deliver scale, empirical rigor, and the nuance the humanities value. Yet this will require deep reflection as these methods develop. More importantly, this work will depend on humanistic methods. We hope to substantiate these claims by addressing some key methodological questions and presenting a case study from our own research.
Click for larger view
View full resolution
The methodological anxieties around the digital humanities, we feel, are healthy, given that the field is in the process of constructing itself. Indeed, when we look closely at how current digital humanities work pursues the promises of scale and empiricism, there remain many problems to work out if we are to deliver on those promises. We’ll focus here on three pervasive problems: anecdotal evidence, validation, and interpretation.
Moving from anecdotal to large-scale evidence is not as straightforward as it seems. Even with millions of texts, the evidence generated [End Page 80] can still be anecdotal. This problem can be understood using two terms we have found useful in thinking about evidence in the digital humanities: signal and concept. A signal is the data from the feature actually being measured computationally. A concept, however, is the phenomenon we take a signal to stand for. In the digital humanities, the interest and impact of our arguments are based on concepts, but computers can only measure signals, which are always smaller than concepts. For example, we may want to explore the waning of cultural memory, but have as our data only trends in the mentions of particular years (for instance, trends in how many times “1930” or “1945” is used) (Michel et al. 178). The essential problem of quantitative evidence, then, is in deciding how to bridge the perpetual distance between the signals we have and the concepts we want them...