Reflections on 20,000 Victorian Newspapers: 'Distant Reading' The Times using The Times Digital Archive

D Liddle - Journal of Victorian Culture, 2012 - academic.oup.com
Journal of Victorian Culture, 2012academic.oup.com
Victorianists and print culture historians go to newspaper databases such as 19th Century
British Library Newspapers, The British Newspaper Archive, and The Times Digital Archive
looking for information in the form of text, but these databases also give us information in the
form of numbers. Users of Cengage's Times Digital Archive, for example, are shown the
count of articles that match our search parameters ('hits'), the count of words in each article
we read, and even the file sizes of the portable document format (pdf) page images we …
Victorianists and print culture historians go to newspaper databases such as 19th Century British Library Newspapers, The British Newspaper Archive, and The Times Digital Archive looking for information in the form of text, but these databases also give us information in the form of numbers. Users of Cengage’s Times Digital Archive, for example, are shown the count of articles that match our search parameters (‘hits’), the count of words in each article we read, and even the file sizes of the portable document format (pdf) page images we download. 1 Such numbers are obviously not part of the original text of historical newspapers, but descriptive data at one remove from textual content–‘metadata’–calculated by optical character recognition (OCR) software, search engine software, or computer operating systems. Most research praxis in the humanities ignores metadata, but in ‘Style, Inc.: Reflections on 7,000 Titles’, Franco Moretti has demonstrated that some data about data has potential to illuminate trends in literary history. 2 Counting and graphing the words in the titles of some 7000 British novels written between the mid-eighteenth and mid-nineteenth centuries, Moretti has shown an important evolution in British publishing toward more information-dense titling conventions. He has suggested the term ‘distant reading’for this method of investigating unreadably large amounts of historical text by finding numerical abstractions that can reveal qualities and patterns within those texts. 3
In this short article I consider whether any of the metadata generated by newspaper databases might have potential to help us ‘distant-read’aspects of the history of British journalism. I will describe experiments only, barely above the backof-the-envelope level, and will put forward few strong claims about newspapers themselves, instead offering preliminary observations and visualizations of what
Oxford University Press