Child Welfare in Victorian Newspapers: Corpus-Based Discourse Analysis
In lieu of an abstract, here is a brief excerpt of the content:

Child Welfare in Victorian Newspapers:
Corpus-Based Discourse Analysis

Corpus linguistics enables the analysis of patterns in large bodies of written material. The use of this approach to trace discourses about infant mortality in all of the text published by four newspapers in England and Wales between 1870 and 1900 detects systematic variations in views about infant welfare by locality. It also reveals some of the strengths and weaknesses inherent in interrogating digitized text with linguistic tools in historical research.

Corpus linguistics, the “study of language based on examples of real life language use,” has found a growing number of applications, such as language description and the interpretation of literary texts. This article discusses its use in historical research, using a case study to examine its strengths and weaknesses for this purpose. The case study examines what nineteenth-century British newspapers can tell us about child welfare and infant mortality.1

Corpus linguistics depends on the use of computers that permit large-scale data processing; it would not otherwise be feasible. Accordingly, the texts studied need to be in digital form. For some countries, such as the United Kingdom, digitized collections now afford historians rapid access to a large body of nineteenth-century journalism. Pre-eminent among these collections is British Library Newspapers—a Gale Cengage product developed with the British Library and JISC funding—which provided the digital text used herein. Although the digitization, based on Optical Character [End Page 159] Recognition (ocr), is of a generally high standard (caveats discussed below), historians need to bear in mind Hitchcock’s reminders about the limitations inherent in how any digital product mediates the past. For example, although Gale’s academic panel made a careful effort to select representative titles for digitization, fewer than ten titles from mainly rural areas of England and Wales offered a long print run. If others had been available, they could have told a different story. No one suggests that the digitized collections could ever be a sufficient source on their own. Nevertheless, these collections offer an effective entrance into the primary sources.2

This article provides an opportunity to assess the potential of corpus linguistics for historians. Its techniques are becoming widely available via free-access, web-based, or downloadable tools, such as CQPweb and AntConc. Since the present research began, Gale Cengage have begun to offer their underlying data and metadata to institutions that subscribe to British Library Newspapers (although its website does not yet make this availability obvious). Historians interested in the use of corpus linguistics should now find it relatively easy to obtain the requisite information.3

what corpus linguistics and discourse analysis do

Corpus linguistics analyzes corpora, large bodies of real-life language data. The corpora in this study were the complete published text of four English newspapers during the nineteenth-century—every word published from 1801 to 1900 in two publications, from their first issue in 1869 or 1870 to 1900 for the other two. Because corpora are generally large (the smallest of these four corpora contains 386 million words), they can reliably serve as a reference “universe” against which claims about the language in smaller parts of the corpus can be [End Page 160] measured. For example, collocation is a key tool that compares the frequency with which certain word forms appear around a particular target word with the frequency with which these forms appear within the whole corpus. The present study, for example, finds that, in one of the titles, defendant and prisoner occur near a group of words for nursing a child much more frequently than they do within the entire corpus (the interpretation discussed later).

More generally, corpus linguistics uncovers objective patterns in a text through a quantitative analysis of language. This form of “distant reading” may well give a truer picture of what a text is saying as a whole than does the subjective impression that a human reader derives from examining the text. Confirmation bias, for instance, is avoided, as is undue emphasis on striking and unusual passages. Although unusual passages can have import for scholars, they also need to know what constitutes “normal” in a...