- A Multimodal System for Gesture Recognition in Interactive Music Performance
Music performance provides an elaborate research test bed of subtle and complex gestural interactions among members of a performance group. To develop a paradigm that allows performers to interact as naturally and subtly with automated digital systems as they do with other human performers, an interface design must allow performers to play their instruments untethered, using only natural cues and body language to control computer information.
This article presents a multimodal system for gesture recognition in untethered interactive flute performance. Using computer vision, audio analysis, and electric-field sensing, a performer's discrete cues are remotely identified, and continuous expressive gestures are captured in musical performance. Cues and gestures that are typical among performers are then used to allow the performer to naturally communicate with an interactive music system, much in the same way that they communicate with another performer. The system features custom-designed electronics and software that performs real-time spectral transformation of audio from the flute.
Our approach therefore makes use of non-contact sensors, specifically microphones, cameras, and electric-field sensors embedded in a music stand that we call the Multimodal Music Stand System, MMSS (Bell et al. 2007). The multimodal array of untethered sensors contained within the music [End Page 69] stand provides data to an analysis system that identifies a set of predetermined gestures as discrete cues, while simultaneously capturing ancillary performance gestures as continuous control data for audio synthesis and transformation processes. This information is used to control various interactions, such as the entrance and exit of a virtual "digital performer," via the discrete cues. In this article, we describe our work in the context of an interactive musical work composed for flute by JoAnn Kuchera-Morin and performed by flautist Jill Felber.
Our primary goals are: (1) to enable the performer to cue the interactive music system using simple gestures that are natural to musicians; (2) to reinforce recognition of cueing gestures through the combination of multiple modalities; (3) to capture ancillary gestures of the performer and map them to real-time audio synthesis and transformation parameters; and (4) to accomplish this without altering the instrument and without requiring direct physical contact from the performer.
This section summarizes the current state of the field in gestural control of interactive musical performance, focusing on multimodal detection and gesture recognition.
Instruments for Expressive Control
Overholt (2007) summarizes three basic groups of gestural controllers for music: (1) instrument-simulating and instrument-inspired controllers, (2) augmented instruments capturing either traditional or extended techniques, and (3) alternative interfaces, with the subcategories of "touch," "non-contact," "wearable," and "borrowed."
Instrument-simulating and instrument-inspired controllers are gestural interfaces that simulate the look and feel of traditional instruments, but do not include the original functionality of these instruments. For example, a guitar-controller that does not have strings but instead uses sensors along the fretboard would fall into the category of instrument-inspired controllers (because the technique used to play it is noticeably different from the instrument on which it was based). A keyboardbased synthesizer is also an example of this first classification, but it is an instrument-simulating interface, because its playing technique mirrors that of the piano. In the case of the flute, the Yamaha WX-7 or Akai EWI provide instrument-inspired options. (More specifically, certain fingering modes are instrument-simulating for a soprano saxophone, and other modes are inspired by the flute, yet the mouthpiece is quite different.)
Augmented instruments retain the full functionality of an original instrument by including mechanical workings of the traditional instrument; however, they have been modified to interact with a computer with the addition of sensors that are intended to capture either traditional or extended techniques. One example of an augmented instrument capturing traditional techniques is the Yamaha Disklavier. It includes all of the strings and mechanical workings of a traditional piano, and hence retains the functionality of the original instrument while gaining the ability to interact with a computer. There are several flutes that can be classified as augmented instruments capturing extended...