- Quantitative historical linguistics: A corpus framework by Gard B. Jenset and Barbara McGillivray
The early twenty-first century has witnessed a major shift toward quantitative approaches in the methodology of linguistics. Specifically, whereas quantitative methods have long been a staple of sociolinguistic and psycholinguistic research, the past two decades have seen their expansion toward descriptive and theoretical grammar. In usage-based approaches to language in particular, like cognitive and probabilistic linguistics, a ‘quantitative turn’ has occurred that applies the statistical testing of hypotheses to data derived from text corpora. The central inspiration for Gard B. Jenset and Barbara McGillivray’s book is the observation that this turn toward quantitative corpus studies has not yet penetrated historical linguistics to the same extent as some other subfields of linguistics. It accordingly sets out to introduce ‘the framework for quantitative historical linguistics’. The seven chapters fall roughly into two parts. In Chs. 1 to 3, a general argumentation in support of quantitative historical linguistics is developed, whereas Chs. 4 to 7 deal with the implementation of the ensuing program. The discussion of ‘why’ thus leads naturally to a discussion of ‘how’.
Two threads run through the first part of the text: a specification of the kind of quantitative historical linguistics that the authors intend to propagate, and an argumentation in favor of the model in question. Important features of this argumentation are a description of the actual situation in historical linguistics and a conceptual defense of the approach against potential objections. Organizationally, Ch. 1 introduces both threads, Ch. 2 develops the first thread, and Ch. 3 the second.
With regard to the first thread, the first chapter introduces the authors’ notion of quantitative research in historical linguistics by means of a double contrast. On the one hand, quantitative research differs from the conventional use of evidence in historical linguistics that rests on example-based categorical judgments about the existence of specific linguistic phenomena but does not look into probabilistic, distributional data about trends of variation and change of the phenomenon in question. On the other hand, quantitative historical research needs to go beyond raw frequencies, in the sense that the multidimensional nature of language requires a multivariate statistical approach. In the second chapter, this conception is further developed in terms of the distinction between corpus-based and corpus-driven approaches. Whereas the former turn to corpora primarily for illustration and confirmation, the latter use corpus data at two stages of the empirical process: corresponding to the distinction between exploratory and confirmatory statistics, quantitative distributional evidence is initially used to generate hypotheses, and subsequently for testing them.
With regard to the second thread, the text provides quantitative data (appropriately, one could say) to the effect that such a method is less entrenched in historical linguistics than other fields of linguistics. This argumentation rests on a comparison of the 2012 volume of Language with six journals with a (not necessarily unique) focus on language change, such as Diachronica, Folia Linguistica Historica, and Language Variation and Change. As an explanation for the observation that historical linguistics seems to be lagging behind, the book invokes early negative experiences with glottochronology, plus the influence of structuralist and generative theories (though this is of course a factor that is not specific to historical linguistics). At the same time, it is demonstrated how the rise of quantitative linguistics goes hand in hand with the growing availability of electronic corpus materials—a trend that obviously creates an opportunity for historical linguistics just as for the other branches of linguistics. [End Page 190]
Next to the ‘the time is ripe, we shouldn’t lag behind’ argument, the plea for quantitative corpus research in historical linguistics includes a ‘nothing is wrong with it’ type of argumentation, in the form of a systematic rejection of potential objections. Section 3.7 skillfully refutes counterarguments from convenience, from redundancy, from scope limitations, from principle, and from pseudoscience. Crucially, it is argued that a quantitative approach is not incompatible with a categorial...