- Purchase/rental options available:
American Speech 75.4 (2000) 365-370
[Access article in PDF]
Prospects in Phonology
Studying the Rhythm of Spoken Discourse
Wladyslaw Cichocki, University of New Brunswick
The utterances that speakers produce and listeners hear have characteristics that are different from their written counterparts. Among these differences are features such as intonation patterns, rhythm, and variations in the pronunciation of vowels and consonants. As our knowledge of spoken discourse grows and as applications in areas such as language acquisition, [End Page 365] automatic speech recognition, and speech synthesis become more numerous, the need to understand these differences becomes increasingly apparent. This essay focuses on one aspect of spoken discourse--its rhythm--and outlines the interplay of various factors that are relevant to its study.
The perception of beats or prominences in speech that we call rhythm is due to the patterning of stressed and unstressed syllables. The patterns of beats are fairly regular, and one of the challenges in the analysis of spoken discourse is to capture these regularities (as well as any irregularities). One way to describe rhythm is in terms of the physical durations of syllables in utterances; that is, we look for patterns of durationally long and durationally short syllables. Statistical modeling of these patterns provides the foundation for a formal description of rhythm and gives us a chance to test in a rigorous way specific hypotheses about which factors are at play in rhythm.
A simple model might predict syllable durations from the number of segments--vowels and consonants--in a syllable. For example, a syllable with a small number of segments, say one or two, is predicted to be shorter than a syllable with a larger number of segments, say three or four. This prediction can be tested experimentally: we can record a speaker reading a list of words and sentences in which there are controlled numbers of syllables of predetermined lengths. Using appropriate acoustic phonetic equipment, we can then measure to an accuracy of a few milliseconds the physical durations of the syllables. Finally, the application of a statistical technique such as regression analysis allows us to verify how good our prediction is.
Empirical research has shown that this simple model does a fairly good job of predicting syllable durations. Indeed, over the past few decades researchers have added other factors to the study of the rhythm of speech. One such factor is the nature of the segments in the syllable. For example, certain vowels are longer than others. Try saying the following sequences of words out loud: bid-bed-bad and bad-bed-bid. In many dialects of English, the vowel in bad is longer than the one in bed, which is longer than the vowel in bid. Consonants also have different lengths: the initial consonant in mid is shorter than the initial consonant in Sid.
Other factors go beyond the syllable itself and take into account the intuition that spoken utterances have structure; that is, they are divided into constituents like words and phrases. Certain patterns of duration are associated with the boundary between words. For example, /tun/ is longer and /kwair/ is shorter in tune#acquire than in tuna#choir. Syllables which are at the end of a phrase are relatively longer than those in the middle.
The interplay of these and other factors in predicting syllable duration has been described using fairly complex models (called multivariate models), [End Page 366] which make much better predictions about speech rhythm than the simple one-factor model described above. Not insignificant is the fact that the model-building enterprise has been informed by work carried out by a wide variety of researchers: phonologists, speech engineers, experimental phoneticians, computer scientists, and statisticians. This combination of specializations has contributed to the current state of knowledge in a number of areas. In the case of applications to speech technology, the results have significantly improved the performance of those machines that can interpret what people say and of other machines that sound human-like (more or less) when they "speak."
Yet far more work is required before we can claim that we really understand how...