Abstract

Authorship attribution studies have traditionally been based on a wide reading knowledge of a text in its historical and generic contexts. With the advent of computers, it became possible to process large quantities of data quickly. However, the first computer-driven attribution methods could only deal with individual words, ignoring grammar, syntax, and all the individualizing features of authorial language. By counting word frequencies and subjecting the word-count information to statistical analysis, it was hoped that authorship problems could be solved. Time has shown that the most this method can achieve is a measure of likeness, not identity. Second-generation research in authorship attribution has opened up a new path, drawing on recent advances in linguistics. Neurolinguists have shown that human utterances often take the form of "chunks" or ready-made groups of words. In parallel, linguists studying large corpora of actual language use have found that certain word groups tend to recur in close proximity. These collocations are partly phrases or idioms in general circulation, partly idiosyncratic formations which an individual speaker or writer uses regularly. By using modern plagiarism software we can establish the distinctive "phraseognomy" of one or more authors within a restricted database, organized by genre and date. Collocation matching, an automated and replicable process, can provide a reliable authorship indicator when dealing with anonymous or coauthored texts. On the evidence given here, it seems certain that the Additions to the 1602 text of Kyd's The Spanish Tragedy were written by Shakespeare.

pdf

Share