In lieu of an abstract, here is a brief excerpt of the content:

6 Style All art is collaboration, and there is little doubt that in the happy ages of literature, striking and beautiful phrases were as ready to the story-teller’s hand as the rich cloaks and dresses of his time. —J. M. Synge, preface to Playboy of the Western World In statistical or quantitative authorship attribution, a researcher attempts to classify a work of unknown or disputed authorship in order to assign it to a known author based on a training set of works of known authorship. Unlike more general document classification, in authorship attribution we do not want to classify documents based on shared or similar document content. Instead, the researcher performs classification based upon an author’s unique signal, or “style.” The working assumption of all such investigation is that writers have distinct and detectable stylistic habits, or “tics.” A consistent problem for authorship researchers, however, is the possibility that other external factors (for example, linguistic register, genre, nationality, gender, ethnicity, and so on) may influence or even overpower the latent authorial signal. Accounting for the influence of external factors on authorial style is an important task for authorship researchers, but the study of influence is also a concern to literary scholars who wish to understand the creative impulse and the degree to which authors are the products of their times and environments. After all, in the quarry of great literature, it is style, or “technique,” that ultimately separates the ore from the tailings.* To greater and lesser extents, individual authors agonize over their craft. After a day spent working on two sentences of Ulysses, James Joyce is reported to have said in response to a question from Frank Budgen: “I have the words already. What I am seeking is the perfect order of words in the sentence” (Budgen 1934, 20).† Some readers will prefer Stephanie Meyer to Anne Rice * Even if the difference between the two finally boils down to a simple matter of opinion. One man’s ore is another man’s silt. † How long Joyce spent picking the right words remains a mystery. Jockers_Text.indd 63 1/11/13 3:06 PM 64 Analysis or Bram Stoker, but when the plots, themes, and genre are essentially similar, reader preference is, in the end, largely a matter of style. No computation is necessary for readers to distinguish between the writings of Jane Austen and Herman Melville; they write about different subjects, and they each have distinct styles of expression. An obvious point of comparison can be found in the two writers’ use of personal pronouns. As a simple example, consider how Austen, who writes more widely about women than Melville, is far more likely to use feminine pronouns. In Sense and Sensibility, for example, Austen uses the female pronoun she 136 times per 10,000 words. In Moby Dick, Melville uses she only 5 times per 10,000.* This is a huge difference but one that is not immediately obvious, or conscious, to readers of the two books. What is obvious is that there are not many women in Moby Dick. Readers are much more likely to notice the absence of women in the book than they are to notice the absence of feminine pronouns; even the most careful close reader is unlikely to pay much attention to the frequency of common pronouns.† It is exactly these subtle “features” (pronouns, articles, conjunctions, and the like), however, that authorship and stylometry researchers have discovered to be the most telling when it comes to revealing an author’s individual style.‡ There are, of course, other stylistic differences that are quite obvious, things that leap out to readers . These are not primarily differences in subject matter but differences in the manner of expression, in the way authors tell their stories. One writer may use an inordinate number of sentence fragments; another may have a fondness for the dash. When these kinds of obvious difference are abundant, and when we simply wish to identify their presence in one author, the use of computation may be unnecessary. Joyce has the habit of introducing dialogue with a dash, whereas D. H. Lawrence does not. The differences between these two writers with regard to the dash are rather striking. Often, however, the differences are not so striking: attributing an unsigned manuscript to one or the other of the Brontë sisters would be a far more challenging problem; their linguistic signatures are quite similar to each other. * These frequencies were derived from the plain...

Share