Deeper Delta across genres and languages: do we really need the most frequent words?

J Rybicki, M Eder - Literary and linguistic computing, 2011 - academic.oup.com
Literary and linguistic computing, 2011academic.oup.com
This article examines the success of authorship attribution of Burrows's Delta in several
corpora representing a variety of languages and genres. Contrary to the approaches of our
predecessors, who only investigated the attributive effectiveness of the very top of the list of
the most frequent words, hundreds of possible combinations of word vectors were tested in
this study, not solely starting with the most frequent word in each corpus. The results show
that Delta works best for prose in English and German and less well for agglutinative …
Abstract
This article examines the success of authorship attribution of Burrows’s Delta in several corpora representing a variety of languages and genres. Contrary to the approaches of our predecessors, who only investigated the attributive effectiveness of the very top of the list of the most frequent words, hundreds of possible combinations of word vectors were tested in this study, not solely starting with the most frequent word in each corpus. The results show that Delta works best for prose in English and German and less well for agglutinative languages such as Polish or Latin.
Oxford University Press