-
Text: A Massively Addressable Object
- University of Minnesota Press
- Chapter
- Additional Information
324 ] Text: A Massively Addressable Object michael witmore At the Working Group for Digital Inquiry at Wisconsin, we’ve just begun our first experiment with a new order of magnitude of texts. Jonathan Hope and I started working with thirty-six items about six years ago when we began to study Shakespeare’s First Folio plays (Witmore and Hope). Last year, we expanded to three-hundred and twenty items with the help of Martin Mueller at Northwestern, exploring the field of early modern drama. Now that the University of Wisconsin has negotiated a license with the University of Michigan to begin working with the files from the Text Creation Partnership (TCP), which contains over twenty-seven thousand items from early modern print, we can up the number again. By January, we will have begun our first one-thousand item experiment, spanning items printed in Britain and North America from 1530 through 1809. Robin Valenza and I, along with our colleagues in computer sciences and the library, will begin working up the data in the spring. Stay tuned for results. New experiments provide opportunities for thought that precede the results. What does it mean to collect, tag, and store an array of texts at this level of generality ? What does it mean to be an“item”or“computational object”within this collection ?What is such a collection? In this post,I want to think further about the nature of the text objects and populations of texts we are working with. What is the distinguishing feature of the digitized text—that ideal object of analysis considered in all its hypothetical relations with other ideal objects? The question itself goes against the grain of recent materialist criticism, which focuses on the physical existence of books and practices involved in making and circulating them. Unlike someone buying an early modern book in the bookstalls around St. Paul’s four hundred years ago, we encounter our TCP texts as computational objects. That doesn’t mean that they are immaterial, however. Human labor has transformed them from microfilm facsimiles of real pages into diplomatic quality digital transcripts, marked up in TEI so that different formatting features can be distinguished. That labor is as real as any other. part iv ][ Blog Posts Text: A Massively Addressable Object [ 325 What distinguishes this text object from others? I would argue that a text is a text because it is massively addressable at different levels of scale. Addressable here means that one can query a position within the text at a certain level of abstraction. In an earlier post, for example, I argued that a text might be thought of as a vector through a metatable of all possible words (Witmore). Why is it possible to think of a text in this fashion? Because a text can be queried at the level of single words and then related to other texts at the same level of abstraction: the table of all possible words could be defined as the aggregate of points of address at a given level of abstraction (the word, as in Google’s new Ngram corpus). Now, we are discussing ideal objects here; addressability implies different levels of abstraction (character, word, phrase, line, etc.), which are stipulative or nominal: such levels are not material properties of texts or Pythagorean ideals; they are, rather, conventions. Here’s the twist. We have physical manifestations of ideal objects (the ideal 1 Henry VI, for example), but these manifestations are only provisional realizations of that ideal. (I am using the word manifestation in the sense advanced in the Online Computer Library Center’s Functional Requirements for Bibliographic Records[FRBR]hierarchy.1 )Thebookorphysicalinstance,then,is one of many levels of address. Backing out into a larger population, we might take a genre of works to be the relevant level of address. Or we could talk about individual lines of print, all the nouns in every line, every third character in every third line. All this variation implies massive flexibility in levels of address. And more provocatively, when we create a digitized population of texts, our modes of address become more and more abstract: all concrete nouns in all the items in the collection, for example, or every item identified as a “History” by Heminges and Condell in the First Folio. Every level is a provisional unity: stable for the purposes of address but also stable because it is the object of address. Books are such provisional unities. So are all...