- When is Noise Speech? A Survey in Sonic Ambiguity
NoiseSpeech is a compositional device in which sound is digitally manipulated with the intention of evoking the sound qualities of unintelligible speech (Dean 2005). Speech is characterized by "rapidly changing broadband sounds" (Zatorre, Belin, and Penhune 2002), whereas music—particularly tonal music—changes more slowly and narrowly in frequency content. As Zatorre and colleagues argue, this distinction may be reflected in better temporal resolution in the left auditory cortex and better spectral resolution in the right, so that perception is adapted to both ranges and extremes of sonic stimuli. NoiseSpeech is constructed either by applying the formant structure (that is, spectral peak content) of speech to noise or other sounds, or by distorting speech sounds such that they no longer form identifiable phonemes or words. The resultant hybrid is an artistic device that, we argue, may owe its force to an encapsulation of the affective qualities of human speech, while intentionally stripping the sounds of any semantic content. In this article, we present an empirical investigation of listener perceptions of NoiseSpeech, demonstrating that non-specialist listeners hear such sounds as similar to each other and to unaltered speech.
NoiseSpeech is ambiguous in evoking the identification of an everyday source of sound—human speech—within the musical context of sound art. Arguably, it could be said to blur the distinction described by Gaver (1993a, 1993b) between two types of listening: the "everyday" and the "musical." When NoiseSpeech occurs in the context of a composition or performance, what form of listening does a listener employ? The context is one of musical listening, yet the identification of the sounds with the human generation of speech is an everyday listening concern. The propensity to identify the source of a sound is a question of interest to both cognitive and ecological approaches to perception. Handel (1989) posits that cognizing sound in terms of sound-causing events may override a more bottom–up sensory perception. Concordantly, Ballas (1993) explores how associations are formed between environmental sound and sound source, listing exposure to particular sounds in everyday life, being able to visualize the sound-producing event, and the similarity of the sound to a mental stereotype.
Dean (2005) proposed that a hybrid of noise and speech may not only invent a new "language" but more importantly may present a new message. NoiseSpeech seems to escape "commodification," and in this respect fits Attali's (1985) concept of "composition", yet it is not devoid of connotation or expression. We argue that NoiseSpeech is likely to evoke affective responses from a listener through its association with the affective expression of human speech (Dean and Bailes 2006). Traditionally, affect has been conceptualized in terms of valence (positive and negative) and arousal (active and passive) dimensions. (See for example Leman et al. 2005). Here, we distinguish affect from emotional connotations that are more concerned with top-down cognitive associations, such as "comfort" or "annoyance." According to this distinction, a certain familiarity with a sound is necessary for emotion to be evoked, but not for the perception of affect. Where a sound is identified as familiarly speech-like, higher-order cognitive processes may be involved, associated with the perception of an emotion. However, when sounds are either strongly distorted through processing or are of ambiguous origin, listeners may perceive this on an affective level as valence and arousal.
Vocal affect expression has been studied throughout history (Banse and Scherer 1996). Links have been made in speech between emotion and altered articulation, respiration, and phonation. In particular, these effects are believed to be quantifiable in terms of acoustic variables such as spectral energy distribution, fundamental frequency, and speech rate. For example, Banse and Scherer (1996) examined the portrayal of different emotions by professional actors, comparing human listener recognition rates with the [End Page 57] results of digital acoustic analyses. Relevant to our concern with extra-semantic affect, the expressions the actors were required to speak were nonsense sentences, albeit composed of recognizable phonemes. Listeners recognized emotion based on acoustic features without semantic clues. We used a related approach in previous artistic works using artificially constructed languages, mainly comprising unrecognizable...