IntroductionThe Trials of the Digital Medievalist
Digital scholarship—the collection of digital data, such as facsimile images or statistics about digitized texts, as well as the processing, sharing, and use of such data—is an area of innovation in the humanities generally, and its growth is a pressing issue for very many humanities researchers, not least medievalists. Digital data and its use is especially a challenge for a medievalist choosing whether or not to work with manuscripts via their digital surrogates or to process data in disparate sources using digital tools or methods. The essays in this volume gather intellectual outcomes from recent projects that have trialed innovative digital technologies, several of them using interoperable tools to examine the creation, circulation, and consumption of medieval books and texts. Three of our writers participated in the Making Medieval English Manuscripts project, collaborating with scholars at the universities of Toronto, Oxford, Sheffield, Cambridge, Drew, and Stanford. A fourth contributor has developed an innovative method to map phrase clusters in Old English. All four write about their discoveries but equally about the limits of their digital methods and tools, ultimately suggesting a range of possibilities for the kinds of textual, paleographical, and codicological research made possible by digital methods. Central to our investigation are questions about how digitization of texts and manuscripts has facilitated research outcomes, and how medieval English scholars may best make use of these digitized materials.
It is no accident that three of our four authors were involved with medievalist projects that used digitized manuscripts: while digital humanities is a relatively new and expanding field, the study of manuscripts has long been central to medieval English studies. As Derek Pearsall puts it: “the study of manuscripts is the most active area of current research in medieval studies: manuscripts are the basic primary material [End Page 147] evidence for literary scholars, historians and art historians alike” (xi). The centrality of such material evidence in answering a broad spectrum of scholarly questions continues to encourage curators to increase the number of their digitally available manuscripts and archives. This increase in turn has allowed scholars to examine larger sets and wider ranges of books via virtual environments as well as in physical archives. In this issue, Alexandra Bolintineanu raises questions about why one might want to gather data digitally and shows how network graphing tools can facilitate us in synthesizing that data. The next three articles by Kathryn Lowe, Estelle Stubbs, and Alex Fleck, draw on shared online databases, using digital manuscript images as the basis of their investigation. Fleck and Lowe have in common with Bolintineanu that they analyzed large quantities of digitized textual and/or image data. Each of these authors used digital images as a starting point for their research, though it is true that perhaps they could have completed the same research (although less conveniently) by looking at the books in person. However, in using digital images rather than the books themselves, the authors raise a number of fundamental questions relating to the value of the digital humanities, questions that we sketch here, such as: how we define data and the digital humanities; whether the digital humanities is more than just the process of creating digital surrogates of manuscripts; what the digital humanist, especially the digital medievalist, can do with the increased data available; and how the digital humanist responds to the potential scepticism towards digital methods. The contributors’ methods also raise vital questions about digital tools specifically: can digital tools interoperate and in what ways can they be brought together to produce scholarly findings and outcomes (as the authors of this introduction found in the course of their own work on the Making Medieval English Manuscripts project). Interoperation allows two digital systems to work together, and as such allows researchers not only to access digital images of manuscripts, but also to do new things with those images, such as virtually cutting and pasting from manuscripts, and creating virtual notebooks of these cuttings (in, for example, Excel); and then sharing those virtual notes and cuttings with experts at a far remove, creating and fostering intellectual community.
For paleographers, the problem has been not the paucity of data necessarily, but rather the inability of the small number of scholars to grapple with the large numbers of uncategorized, unread, and unknown medieval documents in the field—that is, the mass of surviving manuscripts that seem too many ever to read.1 Suddenly the digitization of manuscripts enables scholars to read more, to read faster, and to read [End Page 148] differently. Whereas the proliferation of critical editions in the past hundred years has provided scholars in humanities fields with plenty of primary material to read, medievalists—who from the first were either paleographers or philologists of some kind—had to grapple with access to unique sources, very few of which existed in facsimile. Digitizing the manuscripts makes possible not only remote access but also multiple simultaneous access. Now, however, medieval paleography and original transcriptions need not be the purview of a limited number of people located in the golden triangle of Cambridge, Oxford, and London. For instance, Early English Books Online, Parker Library on the Web, the Piers Plowman Electronic Archive, the Roman de la Rose Digital Library, the Machaut in the Book digitization project, and the public online databases of the Bibliothèque nationale de France, the British Library, the Bodleian, and the University of Cambridge (for a complete list, see the Catalogue of Digitized Medieval Manuscripts at cmrs. ucla.edu), allow simultaneous access to multiple manuscripts in multiple repositories. Furthermore, tools such as Digital Mappaemundi (http://schoenberginstitute.org/dm-tools-for-digital-annotation-and-linking/; see Foys and Bradshaw) and Transcription for Paleographical and Editorial Notation, T-PEN (t-pen.org/TPEN/; see Lowe) are examples of the push to externalize more of the expertise used to distinguish paleographic hands. Just as in the 1960s and 1970s, when digital philologists built searchable digital databanks of texts, now the increase in processing power is allowing libraries and universities to build databanks of images as well, and so the impulse to digitize continues.
There is an increase not only in the number of digital manuscripts—the total number of which represents only a small proportion of manuscripts available—but also in the amount of data that comes from those manuscripts. In fact, three of our contributions illustrate medieval English scholars’ engagement with what a recent series of blog posts has called an age of “Big Data” in medieval studies (Holsinger “Medieval Studies”; Kaye; Treharne). Burdick, Drucker, Lunenfeld, et al. identify two main camps that position themselves differently in relation to big data in digital humanities. The first camp, big data proponents, looks to social science methods and adopt statistical tables and graphs, while the second camp is more interested in how computing tools may be used to represent multiple temporalities and narratives. This second camp often criticizes statistical approaches as naive, trivial, self-evident, and flawed (107). They argue that the digital humanities’ approach to data must necessarily be different from that of the sciences and social sciences, for, as Alexandra Gillespie has recently pointed out: “Big data does not [End Page 149] look like this: 12,000,000. It looks like this: 5×1020”—data in the digital humanities is, in fact, “small data” (Gillespie). But even though the digital humanities does not use data on quite the same vast scale as other disciplines, it is still undergoing what Holsinger calls a “quantitative transformation” in archival material (Holsinger, “Medieval Studies”). Though, as Holsinger warns, only a fraction of the manuscripts and archival material have been digitized so far, often leading to a skewed view of the material available, the nature of our investigations has shifted along with the shift in access. Even if the data is not as “big” as it is in other disciplines, it is still quite unmanageable without digital tools. Three of our contributors—Kathryn Lowe, Alexandra Bolintineanu, and Alex Fleck—engage with otherwise unmanageable data when they take advantage of the processing power of software tools and databases, marshalling an impressive array of visual or textual evidence in support of their claims. And yet, as is typical among humanities scholars, they show a certain modesty, a reticence in making macro claims. As Burdick et al. point out, the humanities scholar prefers ambiguity as a method: “we need to take seriously the conviction that the humanities have their own methods—not based in calculation, automation, or statistical probability, but in ambiguity, interpretation, and in embodied and situated models of knowledge and knowing” (92). Ambiguity as a method, and “small data,” unavoidably inflect medievalists’ uses of digital tools.
A common criticism from outside the digital humanities is that it is “all technique and lacks content” (Burdick et al. 92). While Burdick et al. propose that to some extent digital humanities technique is a sort of content, in this special issue we attempt to engage even skeptical readers in suggesting that the technique is a useful one in augmenting scholarly content. As Burdick et al. note, simply digitizing materials or typing into a computer is not digital humanities (102). As the quantity of such digital data continues to grow, it is important to take stock of whether or not, or to what extent, digital technologies have facilitated intellectual goals in medieval English studies. It is clear that the scholarly work of all the Making Medieval English Manuscripts and other digital projects has enabled some tangible discoveries about scribal identities and manuscript contexts. For example, our work on the Parker Scribes project at the University of Oxford and the University of Toronto used a mix of traditional paleographical techniques and image editing software (Digital Mappaemundi) which allowed us to mark up manuscripts virtually and categorize them by scribal features, creating scribal profiles as well as virtual hyperlinks to each of the manuscripts sharing scribal features. Alex Fleck used the same technical tool, Digital Mappaemundi, in [End Page 150] his work on the Paleographical Cruxes in Old English Manuscripts project. The results of the Parker Scribe project will be forthcoming in the next release of Parker Library on the Web (http://parkerweb.stanford.edu/), which discusses identification of a fifteenth-century network of scribes, including John Parker, Matthew Parker, John Bale, Lyly, Robert Talbot, and Stephen Batman. The virtually marked-up images were sourced in collaboration with the Corpus Christi College Library in Cambridge, whose database Parker Library on the Web presents many of their medieval manuscripts for browsing online.
One limitation for digital medievalists is that not all digital repositories are made open for such analysis and tinkering; digital repositories are as disparate as the libraries and archives in which the materials were originally available (Lowe 1005–06) and institutions vary greatly in the uses to which they put the digitized material. Often the sole product available to the scholar is a static, finished database: digitized materials do not spontaneously interact with other digitized materials in a way that proves simple and fruitful. It seems incumbent on us to repurpose, visualize, and interpret texts that are now digitized. The next horizon for digital humanists interested in digitized manuscripts is thus interoperability—the practice of interpreting, analyzing and aggregating digitized materials with software tools. Interoperability was an important goal of the Making Medieval English Manuscripts cluster, many of which used the Digital Mappaemundi tool to group, categorize, and annotate other digitally available material. Such annotation was done using graphical user interfaces that resemble image editing software, and which facilitate visualization, not just illustration. The Digital Mappaemundi tool was able to interoperate with several archives, including the Dictionary of Old English and the Parker Library on the Web database. Such interoperability experiments allowed medievalists to test how far digital tools could go to answer their scholarly questions.
Despite these advances, which show the potential of the digital humanities for enabling such scholarly outcomes, mainstream scholars in medieval English studies, philology more generally, and even conservatives outside these disciplines remain skeptical about the necessity of digital analysis. Adam Kirsch avows in a New Republic article that the field has distinctly “anti-humanistic manifestations” and “has no common essence,” by which he means that the digital humanities is a collection of disparate activities, a range of approaches rather than a defined field. It is true that a search for the term “digital humanities” in the MLA International Bibliography over the last ten years produces results on a spectrum of essays on sharing scholarship and open access, the [End Page 151] changing approaches to teaching, the use of Twitter and of blogging in research and teaching, on interdisciplinary approaches made possible by digitization, the size of the datasets made available, the methodological shifts as a result of these data, and questioning the role and future of the digital humanities in traditional scholarship. In general, these articles examine the potential offered by the digital humanities to fundamentally change traditional research methodologies. These articles also lean towards a broad justification of the digital humanities, especially the validity of digital publication over traditional print, which brings with it the promise of quality. But the digital humanities are still very marginal in literary studies; there, digital humanist arguments do not enter much into mainstream scholarship (Rommel). Extrapolating from Rommel, it would seem that distant reading—“the crunching of large quantities of information across a corpus of textual data or its metadata”—does not have the prestige of close reading (Burdick et al. 18).2
To what does the digital humanities owe its marginality? As a keyword search indicates, the digital humanities encompasses many subfields. Two useful publications give a sense of the theoretical and methodological range of scholarship that considers itself digitally humanist, the Digital Humanities handbook (by Burdick et al.) and the Blackwell Companion to Digital Humanities (by Schreibman et al.). The Companion’s essays reveal that digital humanities means something quite distinctive in each academic field. The advent of computing affected fields such as archaeology, history, and literary studies each in quite different ways. In literary studies, word frequency studies are the most common example of digital humanities work, and are somewhat marginal to the field. Alexandra Bolintineanu’s contribution to this issue is in essence a graphic visualization of a word frequency study, one which challenges assumptions about the uses to which we can put a databased corpus. Bolintineanu uses network-modeling software to graph data gleaned from searching the digitized Dictionary of Old English Web Corpus for distinctive expressions of uncertainty. She augments her graphs and numbers with close readings and an interpretive conclusion, however, and this illustrates a point made by Thomas Rommel, that in computer-assisted literary studies, “the aim of the investigation needs to be clarified; every ‘computation into criticism,’ to use [John] Burrows’s term, has to provide results that transcend the narrow confines of stylostatistical exercises” (Rommel). Bolintineanu’s article works to transcend this confine by doing close reading. The value in her close reading is the extent to which it helps inform her “distant reading” of the data. [End Page 152]
The continuing marginality of the digital humanities has prompted repeated discussion about the “state of the digital humanities,” for example in the special issue of Literature Compass entitled E-medieval: Teaching, Research, and the Net (December 2012) in which contributors reviewed the digital humanities and the methodologies enabled by them. In that issue Larry Swain asserts that technological change has not altered our methods but that it should do so (Swain 925) and is an immense opportunity for us to become more collaborative (929). It is also an opportunity to investigate open access options, and for us to move from publishing in pseudo-print forms of media (PDFs) to searchable html/text archives. Part of the problem is that the outlets available for the publication of the digital research are limited to traditional formats: electronically available journals are imitations of those available in print, largely due to the publishing infrastructures in place (Swain 925–26). Like Swain, Matthew Fisher is also pessimistic about the success to date of the digital humanities in bringing together the quantity of online work and the quality of print publication: “at the moment many of the most important aspects of the digital and the non-digital remain disconnected and difficult to bring into dialogue with each other” (Fisher 955). Fisher argues that digital resources are often used only as surrogates for print and are not meant to serve a variety of novel purposes (Fisher 959). Most were designed in order to produce a digital edition of a medieval text (The Electronic Beowulf, The Canterbury Tales Project, and The Piers Plowman Electronic Archive are the examples Fisher gives), online versions of reference works such as the Middle English Dictionary, or digital images of manuscript folios. Such editions have made more inroads into mainstream scholarship than have other kinds of humanities computing endeavors.
If these opinions come across as gloomy it is because they demand more from digital humanities than its accomplishments so far. Those who self-identify as digital humanists see themselves as descended from the encounter with humanities computing, but not identical with it. In our own issue, Estelle Stubbs echoes Burdick et al.’s point that typing into a machine is not the digital humanities: it is not enough of an achievement for the digitization process to make a superior version of a microfilm. Where humanities computing is seen as the use of computers to store and process humanities data, digital humanities includes the means by which scholars attempt to deal with the many pieces of data as well as constructing claims and theories about that data and the methods used to present research. The shift in terminology suggests an ideological shift in practice (Svennson). Digital data generated from [End Page 153] computing should lead to more and better collaboration among the humanities: humanities computing should lead to the digital humanities. Like other hopeful digital humanists, we agree with Matthew Fisher, who expects nothing less than the “democratization of interoperable digital tools” and new frameworks for trust in terms of shared data curation—new work in new ways (Fisher 955).
To some extent digital tools already do let us do some new work in some new ways. One contributor to this issue, Kathryn Lowe, worked with research assistants locally (Glasgow) and remotely (Toronto) to transcribe manuscripts using the digital transcription tool T-PEN. In the Literature Compass issue, Lowe explains why a digital edition of Anglo-Saxon charters made using T-PEN might be desired. T-PEN, Lowe explains, is able to do things a print edition would not: for example, Sawyer 980, a frequently copied Anglo-Saxon charter, has a complex textual history that cannot be entirely represented in the print edition, but its complexity can be usefully represented with a digital edition that allows readers to turn spelling variants on and off.3 In this issue, Lowe reveals some of the conclusions of experiments that show Ælfrician and non-Ælfrician features, conclusions she is able to come to having done a significant amount of transcription using T-PEN, despite some of the tool’s limitations. The interface slowed down advanced transcribers, making it impractical for accomplished paleographers, but this same slowness assisted neophyte transcribers. It was also impractical to store the transcriptions only in T-PEN long-term. However, Lowe suggests that online transcriptions might make access possible to anonymous homilies that are not otherwise available. There is thus some scholarly and pedagogical potential in digitally available material and the tools and technologies created to support work with them. The digital tool Lowe used was networked across continents allowing collaboration from workers at several universities. The complete skepticism of numerous writers, such as Kirsch—whose article has sections reproduced on govlab.org under the title of “The False Promise of the Digital Humanities”—seems unfounded. The constructive criticism in the Literature Compass issue leaves room to embrace at least some of the positivity that characterizes the introduction to Digital_Humanities (2004).
In this collection of articles we profiled issues of skepticism, scholarly limits, big data, and the marginality of medieval English digital humanities because these are issues that challenged our own authors who all used digital tools. Alexandra Bolintineanu gathered data by searching the Old English Dictionary online, creating a database in Excel of declarations of unknowing, and mapping the declarations’ frequency [End Page 154] using the social network mapping software Gephi. Alex Fleck gathered data using the Digital Mappaemundi tool which operates in and alongside Parker Library on the Web, and this access to manuscripts at the Parker Library in Cambridge allows him to engage with Bately and Dumville about the hands in the Anglo-Saxon Chronicle; moreover, the pictures the tool provides also let us engage with Fleck’s argument. Alex Fleck uses several image captures from Digital Mappaemundi to show that in the Anglo-Saxon Parker Chronicle, CCCC MS 173, Hands 2a–c, 3 and 4 (posited by earlier scholars) are in fact one hand. Several tables of image clips accompany his article, in keeping with conventional print journal practice. However, as an addition to his article we provide a digital PDF that allows readers to easily navigate between clipped letter forms and their position on the folio itself. As well as this, readers are able to navigate directly to the Parker Library on the Web image of the folio from the article itself. Estelle Stubbs also gathered data using Parker Library on the Web, and trialed social network software in mapping data showing relationships between the scribes and the manuscripts. In her article, however, she concentrates on the historical data she found that led to her to identify a possible relationship between Richard Frampton and the Duchy of Lancaster; this relationship, she suggests, began earlier than has been otherwise proven. As mentioned above, Kathryn Lowe trials T-PEN, a tool which allowed her to research and catalogue spelling variants remotely and at a faster pace than would be possible when studying original manuscripts. She points out that the data gleaned from the tool lets her present a fuller picture than a traditional, scholarly print journal article of six to eight thousand words might have done. If there is something common to this selection of research, it is that it allows a greater amount of detail to support each writer’s conclusion. In a sense we are not close reading texts, nor are we analyzing short portions of texts for paleographic and linguistic variants. These researchers are practicing something akin to “distant reading” by looking at large amounts of manuscript and text data and using digital tools to help select, collate, and catalogue examples or groups of examples on which they base their conclusions.
The articles in this volume also examine issues that arise as a result of having large amounts of data and many discuss the frustrations of working with new tools for humanities research, often still in the testing phase. Kathryn Lowe writes: “early concerns about potential loss of data led to the decision not to enter text directly into the web interface but to upload it subsequently having completed the transcription using a standard word-processing package.” Likewise, Alexandra Bolintineanu [End Page 155] describes her decision to manually check all of her examples in order to substantiate the digital findings. In our own experience as research assistants on the Parker Scribes project within the Making Medieval English Manuscripts cluster, our data was frequently lost or overwritten during the process of digital transcription and we currently face the problem of being unable to access the annotation tools or to export the data from the tools onto other platforms. Such issues cause the data produced to seem unstable. The tools themselves need to be frequently audited, which perhaps results in some of the skepticism we have outlined above. At this stage they are sometimes unreliable. For academics who are already pressed for time, the failure of a tool is just another frustration. Unfortunately these kinds of experiences are difficult to learn from unless there is scholarly discussion about the failures, and this does not happen often. Digital tools are expensive. In order to seek new funds for new tools, researchers need to show some optimism in hindsight, highlighting prior successes, not failures; and they would probably do well to exude even more optimism about future successes. Yet, as Burdick et al. point out, in Silicon Valley “failure is not only tolerated, it is massively funded” (22). There is no such funding for expensive failure in the humanities while we learn to manage big sets of data.
As we have suggested above, our contributors, along with many other digital humanists, highlight an important methodological issue: how to represent large datasets in the limited scope of a journal article. This challenge has already been encountered by scientists and social scientists, who really do deal with big data. As such, there exist sharing tools in the public domain that will facilitate open access to the data produced by researchers. That is, even though the datasets that the contributors here work with would be considered “small data” rather than big data, they are still too vast to incorporate into the argument of an article. Many of the contributors to this volume faced the problem of balancing the richness of data they had available with the arguments they wished to present. This balance, between representing datasets and traditional philological argument is difficult to negotiate. There is an opposition created here between presenting quantitative and qualitative findings (a thread which has been running through this introduction) which really only comes to a head when datasets need to be converted into traditional scholarly outputs, particularly when medievalists are still working within the relatively small scope of the printed journal article. As Peter Stokes notes, paleographers tend to “express qualitative opinions rather than objective arguments” and have developed a method that “depends on the authority of the author and the faith of [End Page 156] the reader” (Stokes 138). Stokes notes that paleographers’ subjective impressions are uniquely difficult to communicate. The question then arises whether digital methods—which allow faster and easier collation, as well as the incidental accumulation of quantitative as well as qualitative data—might lead to the easier communication of these apparently subjective impressions, or even the objectivization of some of these previously subjective ideas.
There are two ways in which we contend with the problem of incorporating quantity into quality in this issue, and this volume trials both: either to change the way that articles handle digitally produced data or change the way we write about digitally produced data. Our authors sometimes treat data as a kind of footnote, referring generally to the bulk of the information behind scholarly conclusions. The articles in this volume include references to tables (Lowe and Fleck) and diagrams (Bolintinineau) which illustrate their large datasets whilst allowing these to stand separately from the argument of their pieces. We have also used Figshare to post datasets and supporting documentation (see for example http://dx.doi.org/10.6084/m9.figshare.1284669). In this way we are attempting to expand the data available to the public by using online storage in conjunction with a traditional journal format. By visiting this Figshare project, readers also have access to live hyperlinks, screencasts, screenshots, and videos, so that the arguments of the articles do not necessarily end when the articles do. Our hope for this volume, therefore, is to open up current and ongoing questions of the representation of data and of how interoperable data can be made accessible, widely useable, and sustainable.
1. In one sense the digitization of manuscripts is an extension of the impulse to digitize texts so as to be able to search them faster, an impulse that found expression in the pioneering work of Roberto Busa, a Jesuit priest and professor. Busa indexed and lemmatized the works of Thomas Aquinas from 1949 onwards. Among humanist scholars, medievalists were early adopters of computing methods; for example, the Dictionary of Old English was one of the first to digitize an entire corpus.
2. A 2011 discussion of Stanford’s Literary Lab in the New York Times suggested that Franco Moretti’s distant reading is more of a theology than a scientific or literary method, adding disdainfully, “Moretti isn’t interested in the unquantifiable, inscrutable actions of intelligent human beings trying to write stuff” (Schulz). See also The Stanford Literary Lab site at http://litlab.stanford.edu/. [End Page 157]