In lieu of an abstract, here is a brief excerpt of the content:

  • Letters to Language
  • D. H. Whalen, Harriet S. Magen, Marianne Pouplier, A. Min Kang, Khalil Iskarous, and Ch.-J. N. Bailey

Language accepts letters from readers that briefly and succinctly respond to or comment upon either material published previously in the journal or issues deemed of importance to the field. The editor reserves the right to edit letters as needed. Brief replies from relevant parties are included as warranted.

Vowel targets without a hyperspace effect

March 17, 2004

To the Editor:

In a paper with broad implications published in this journal, Keith Johnson, Edward Flemming, and Richard Wright (hereafter, JFW; ‘The hyperspace effect: Phonetic targets are hyperarticulated’, Language 69.505–28, 1993) provided experimental evidence suggesting that listeners preferred vowels with more extreme values than those they produced. In their study, listeners were to pick a ‘best exemplar’ of a vowel from a systematic exploration of a vowel space created on a speech synthesizer. The chosen exemplars (averaged across male and female listeners) were more extreme in formant values than the average productions (by the males). JFW claimed this ‘hyperspace’ effect underlies all vowel production, even though the targets were more extreme than their speakers produced even when hyperarticulating. The analogy that they provide to motivate the hyperspace effect is that full vowels cannot be predicted from reduced vowels, so reduced vowels cannot be taken as the underlying form (p. 524). By assuming that careful speech, or hyperarticulation, is similarly unpredictable, JFW expected that vowel targets would be hyperarticulated.

In their perceptual study, this is what seemed to be happening: Preferred vowels were outside the articulatory space of the male talkers. We have elsewhere shown that the experimental data depends both on the acoustics of the synthetic stimuli used to measure the effect and on the dialect of the listeners (D. H. Whalen, Harriet S. Magen, Marianne Pouplier, A. Min Kang, and Khalil Iskarous, ‘Vowel production and perception: Hyperarticulation without a hyperspace effect’, Language and Speech, to appear, 2004). Here we summarize the results and further explore some of the theoretical reasons for finding that there is no need for a hyperspace effect.

First, for both studies, dialect affects production and thus shifts the vowel space. In JFW’s data, the one non-California speaker shows that this is true. The production values for their speaker from New York are given separately; if the perception results are compared to the New York speaker’s production space, only two of eleven vowels exhibit a hyperspace effect. The fronting of back vowels in the California dialect is responsible for much of the apparent hyperspace effect, as confirmed by the Rhode Island speakers in our study.

Second, in our study, as in JFW’s, even when speakers hyperarticulated, their productions were less extreme than their perceptions. This would imply that hyperarticulations should lie on a line between the full vowel productions and the perceptual (hyperspace) target. In both sets of production results, the directions in which the hyperarticulations went in formant space were quite sensible. For our data, the direction from full to hyperarticulated vowel was away from centralization for five vowels, decentralized for either F1or F2 for five vowels, and aberrant for /u/ (where the full and hyperarticulations were nearly the same). The perceptual targets, by contrast, were fairly randomly distributed relative to either full or hyperarticulated vowels. If hyperarticulation is supposed to be asymptotically closer to a ‘hyperspace’ target, then there is no evidence of that trend in either our data or that of JFW.

Third, the preferred perceptual targets of the female listeners were essentially identical to those of the male listeners, even though those targets were ‘reduced’ from the standpoint of the female vowel spaces (see Whalen et al.). What this indicates is that both sets of listeners were normalizing the vowel space of this synthetic ‘talker’. By collapsing the perceptual results but not the production results, JFW failed to address the issue.

Fourth, the method used by JFW to test the perception seems to lead to more extreme judgments than listeners actually prefer. This ‘method of adjustment’ allows the listener to pick vowels from a grid until the best match for an English vowel (as specified by the...

pdf

Share