- Visualization in Audio Based Music Information Retrieval
Music information retrieval (MIR) is an emerging research area that explores how music stored digitally can be effectively organized, searched, retrieved, and browsed. The explosive growth of online music distribution, portable music players, and lowering costs of recording indicate that in the near future, most of the recorded music in human history will be available digitally. MIR is steadily growing as a research area, as can be evidenced by the international conference on music information retrieval (ISMIR) series (soon in its sixth year) and the increasing number of MIR-related publications in Computer Music Journal and other journals and conference proceedings.
Designing and developing visualization tools for effectively interacting with large music collections is the main topic of this overview article. Connecting visual information with music and sound has fascinated composers, artists, and painters for a long time. Rapid advances in computer performance have enabled a variety of creative endeavors to connect image and sound, ranging from simple direct renderings of spectrograms popular in software music players to elaborate real-time interactive systems with three-dimensional graphics. Most existing tools and interfaces that use visual representations of audio/music such as audio editors treat audio as a monolithic block of digital samples without any information regarding its content. The systems described in this overview are characterized by the fact that they attempt to visually represent higher-level information about the content of music. MIR is a new field, and visualization for MIR is still in its infancy; therefore we believe that this article provides a comprehensive overview of the current state of the art in this area and will inspire other researchers to contribute new ideas.
There has been considerable interest in making music visible. Many artists have attempted to realize the images elicited by sound (Walt Disney's Fantasia being an early, well-known example). Another approach is to quantitatively render the time or frequency content of the audio signal, using methods such as the oscillograph and sound spectrograph (Koening, Dunn, and Lacey 1946; Potter, Kopp, and Green 1947). These are intended primarily for scientific or quantitative analysis, although artists like Mary Ellen Bute have used quantitative methods such as the cathode ray oscilloscope toward artistic ends (Moritz 1996). Other visualizations are derived from note-based or score-representations of music, typically MIDI note events (Malinowski 1988; Smith and Williams 1997; Sapp 2001).
The idea of representing sound as a visual object in a two- or three-dimensional space with properties related to the audio content originated in psychoacoustics. By analyzing data collected from user studies, it is possible to construct perceptual spaces that visually show similarity relations between [End Page 42] single notes of different musical instruments (Grey 1975). Using such a timbre space as control in computer music and performance was explored by Wessel (1979). This idea has been used in the Intuitive Sound Editing Environment (ISEE), in which nested two- and three-dimensional visual spaces are used to browse instrument sounds as experienced by musicians using MIDI synthesizers and samples (Vertegaal and Bonis 1994). The Sonic Browser is a tool for accessing sounds or collections of sounds using sound spatialization and context-overview visualization techniques where each sound is represented as a visual object (Fernström and Brazil 2001). Another approach is to visualize the low-level perceptual processing of the human auditory system (Slaney 1997). An interesting visualization that combines traditional audio editing waveform representations and pitch-based placement of notes is used in the Melodyne software by Celemony (available online at www.celemony.com/cms/).
The main goal of this article is to provide an overview of visualization techniques developed in the context of music information retrieval for representing polyphonic audio signals. One of the defining characteristics that differentiate the techniques described in this article from most previous work is that the techniques described here use sophisticated analysis algorithms to automatically extract content information from music stored in digital audio format. The extracted information is then rendered visually. Visualization techniques have been used in many scientific domains (e.g., Spence 2001; Fayyad, Grinstein, and Wierse 2002); they...