- Letters to Language
Language accepts letters from readers that briefly and succinctly respond to or comment upon either material published previously in the journal or issues deemed of importance to the field. The editor reserves the right to edit letters as needed. Brief replies from relevant parties are included as warranted.
Fractal dimensions of discourse
January 4, 2005
To the Editor:
Like any thriving discipline, linguistics often divides itself into opposing camps. Nevertheless, all linguists accept certain fundamental concepts, such as constituent structure. Thus, every linguistic theory includes (i) an inventory of significant elements and (ii) principles for combining these elements to form larger wholes. Some proposed categories are binary, like [± voice]; others are gradient, like the sonority scale; and combinations of linguistic elements are either coordinate or subordinate/superordinate. Tree diagrams embody both of these dimensions: node labels comprise inventories of significant elements, and tree branches show how these elements are combined, together with their hierarchical relationships.
Linguistic theories differ with regard to their inventories of significant elements and how to combine/segment them. Theories also differ as to the relative prominence assigned to constituents, sometimes depending on the analytic frame of reference. For example, vowels are more salient acoustically than consonants, so syllabic tree diagrams represent vocalic nuclei as superordinate heads. But consonants typically convey more information than vowels do, so written abbreviations omit vowels more often than consonants. In addition, syllabic writing systems predate alphabetic systems, whereas no purely vocalic writing system ever existed. So, from the point of view of sonority, vowels are more salient than consonants; but from the point of view of informativity, consonants are more salient than vowels. Such examples illustrate that linguistic representations are inherently multiple. But, having acknowledged this, we concentrate hereafter on just one linguistic variable— informativity—and its structural patterns.
Early descriptions of informativity treat it as a binary category: given (old, known, predictable, evoked) information vs. new (unknown, unpredictable) information. These binary categories can be translated into numerical indices: 1.0 for new information and 0.0 for given information. Nevertheless, large-scale statistical studies of information flow remain impractical because word-by-word judgments of informativity are laborious to produce and hard to replicate. Responding to this problem, Youmans 1991 (‘A new tool for discourse analysis: The vocabulary-management profile’, Language 67.763– 89) proposes a simple algorithm: assign an index of 1.0 to words that appear for the first time in a discourse and 0.0 to repetitions. Next, plot a moving average of these indices, over, say, 35-, 55-, or 101-word intervals. The resulting graphs—vocabulary-management profiles (VMPs)—rise when the incidence of new vocabulary increases and fall when repetitions increase. New words correlate with new information, and repeated words correlate with given information; consequently, VMPs are approximate visual analogues for the ebb and flow of information in discourse.
VMPs resemble stock market curves, with smaller peaks and valleys superimposed upon larger ones. Such curves are formally equivalent to tree diagrams without node labels. That is, VMPs embody [End Page 297] constituent structure—recursive patterns of segmentation/combination and relative prominence—but they do not label constituent types, such as NP, VP, S, paragraph, episode, chapter, and the like. Nevertheless, VMP peaks and valleys do correlate with discourse constituents. For example, high-frequency function words tend to occur at the beginnings of phrases, whereas lower-frequency content words occur later. As a result, VMPs commonly dip to short-term valleys at the beginnings of phrases, then rise to short-term peaks. At the sentence level, given information, repeated vocabulary, and VMP valleys occur more often in subjects; whereas new information, new vocabulary, and VMP peaks occur more often in predicates. Paragraphs, sections, episodes, and the like typically introduce new topics and new vocabulary toward their beginnings, causing VMPs to rise to intermediate-term peaks. As speakers/authors elaborate on these topics, their vocabulary becomes more repetitive, causing VMPs to fall to intermediate-term valleys. Such cycles repeat themselves throughout a discourse.
Overall, the peaks and valleys of VMPs from this initial version of the tool (VMP1) correlate remarkably well with...