In lieu of an abstract, here is a brief excerpt of the content:

  • Electronic Text Analysis and Nineteenth-Century Newspapers:TokenX and the Richmond Daily Dispatch
  • Elizabeth Lorang and Brian Pytlik Zillig

Early in Graphs, Maps, Trees: Abstract Models for a Literary History, Franco Moretti asks, "What would happen if literary historians, too, decided to 'shift their gaze' . . . 'from the extraordinary to the everyday, from exceptional events to the large mass of facts'? What literature would we find, in 'the large mass of facts'?" (3). Although Moretti considers these questions in relation to the European novel, they are intriguing ones to ask in regard to other textual forms and genres as well. For the literary historian, what is more everyday—literally, figuratively—than the daily newspaper, where, in the nineteenth century, poetry and fiction existed alongside news articles, advertisements, editorials, and an array of other texts, including death and marriage announcements, weather reports, and public notices? Poems could tell the news, and sometimes the news was fiction; genres blurred. When posed of newspapers and their literature, then, Moretti's question—"What literature would we find, in 'the large mass of facts'?"—takes on additional nuance.

By and large, literary historians have not turned their gaze to the newspaper, and the number of literary scholars treating the nineteenth-century newspaper in any of its varied incarnations (daily, weekly, local, national, illustrated, story, religious, political, ethnic, multilingual, and so on) is small. Several factors have kept attention focused elsewhere, among them long-prevalent models of literary scholarship, which have tended to privilege authors, forms, and genres not represented in nineteenth-century newspapers. Difficulties in gaining access to the materials, whether print originals or facsimiles, have hindered research. And when you find them, the sheer abundance of their text is daunting. Recently, however, newspapers have appeared in a variety of contexts in American literary studies. Scholars have turned to newspapers in projects on women writers, reader response and reception, fiction in late-nineteenth-century papers, and the [End Page 303] public role of poetry in American history and culture.1 At the same time, renewed attention to newspapers by libraries, governments, and universities in the last decade, along with an increase in programs for digitizing historical materials, minimizes some of the difficulties of access and creates exciting opportunities for newspaper research.2

This essay has emerged out of a larger project to include newspapers in the history and analysis of American literature and to examine the relationship between newspapers and poetry in the United States during the nineteenth century. With so much historical newspaper content available electronically—far more than is human-readable—text analysis and data-mining techniques may enable scholars to study the newspapers, and their poetry, in new ways. To effect such study, analytical tools simple enough for widespread use, but rigorous enough in design to meet scholarly standards, will be necessary. Most such tools are at the moment not web-based and, in many cases, require knowledge of scripting languages to maximize results. No less than analytical strategies or theories, then, designing a system capable of large-scale text analysis that can be used widely is a priority for us.

The first in an anticipated series of essays about electronic text analysis and digital newspaper corpora, this essay is largely exploratory and examines one approach for the electronic analysis of newspapers. For the current project, we began with a test set of digitized newspapers and an online text analysis tool, TokenX, with an original goal of studying and comparing word sequences that appear in the newspapers' poetic content and those that appear in the rest of the newspapers' text. Investigating word sequences in the newspapers may help identify trends in content and demonstrate intertextual links among a variety of kinds of newspaper content along with the possible reciprocal influences of articles, editorials, advertising, and poetry. Such an approach may also foreground pieces of news, literature, and points of view that were important to those who published and read the newspapers but that have been buried in the historical record or deemed insignificant in historical interpretation.

Locating a suitable test corpus of historical newspapers and fully processing even a relatively small set of newspaper files with TokenX, however, emerged as two major research...

pdf