- Modeling Expression with Perceptual Audio Features to Enhance User Interaction
"Natural interfaces" represent one of the fast-moving investigation topics in the design of modern electronic appliances for both domestic and professional use. These interfaces stress the idea that the interaction should mimic everyday life under many different aspects, from the input device used to the feedback received and to the embodiment of the interaction. Nonverbal communication plays an important role in our everyday life, and the auditory modality is used to comprehend many different kinds of information and shorter time-span states such as moods and emotions.
In this work, a model for the expressive control of unstructured sounds is proposed. Starting from the investigation of simple musical gestures played with various instruments (repeated notes, scales, and short excerpts), a set of relevant audio features for expression description is selected by statistical analysis. Selected features are not related to musical scores or structures, thus yielding an ecological approach to the representation of expression communication. In particular, perceptual features like roughness and spectral centroid provide additional descriptors related to texture and brightness, as opposed to the timing/intensity-based parameters, which lead to typical music-oriented characterizations. Afterwards, the control parameters of an expressive synthesis model are tuned according to the results of analysis to add expressive content to simple synthetic sounds. Listening tests were conducted to validate the model and results confirm the impact that this model can have on affective communication in human–computer interaction (HCI).
We can use the 1913 Webster's Dictionary definition for "expression" as appropriate in our context: expression is a "lively or vivid representation of meaning, sentiment or feeling, etc.; significant and impressive indication, whether by language appearance, or gesture; that manner or style which gives life and suggestive force to ideas and sentiments" (Porter 1913). Expression is a key factor of human behavior, and it plays a central role in non-verbal communication, because it supplies information beyond the rational content of explicit messages represented by texts or scores.
Expression can be used to make the communication more attractive and pleasant as well as to help understanding the messages and to disambiguate human judgment. Researchers of HCI try to adapt the new mediating technology to the basic forms of human communication. In fact, many typical communication modes are non-linguistic and based on movement, action, gestures, and mimetic activities. Emotional content is conveyed by expression, and the issue of addressing it is then centrally involved in determining the affective state of the user; it must be taken into account to emphasize the communication through different sensory modalities. Specifically, communication through the auditory channel is one of the most investigated topics in the sound and music computing community. The NIME conferences (www.nime.org), for instance, are dedicated to the research of new interfaces for musical expression, putting the emphasis on the need for specific research in HCI.
Beyond musical expression, sounds can inform the user about the environment, and they can stimulate different behavioral reactions. Recent [End Page 65] research on sound interaction design (e.g., www.cost-sid.org) is focused on expressive manipulation of artifacts through continuous audio feedback. In fact, sound possesses a number of characteristics that can be used for the purpose of communication (e.g., onset, decay, pitch, loudness, timbre, and rate of change or tempo). These features concern physical properties of the sound, and they can be mapped to higher-level semantics referring to various levels of cognitive processes.
Adding expression to music performances is a widely explored field, and various synthesis models have been proposed (Canazza et al. 2000; Hiraga et al. 2000; Friberg, Bresin, and Sundberg 2006). The idea behind these models is to follow the behavior of human performers: Musicians enrich their performances with expression by acting on available degrees of freedom (De Poli et al. 1998) and introducing deviations from a mechanical playing of the score. Beyond the interpretation of a score, expressive synthesis models can concern tones and simple sounds without any musical structure. The control of these unstructured sounds has been less studied and explored, but it...