In lieu of an abstract, here is a brief excerpt of the content:

  • Key, Chord, and Rhythm Tracking of Popular Music Recordings
  • Arun Shenoy and Ye Wang

In this article, we propose a framework to analyze a musical audio signal (sampled from a popular music CD) and determine its key, provide usable chord transcriptions, and obtain the hierarchical rhythm structure representation comprising the quarter-note, half-note, and whole-note (or measure) levels. This framework is just one specific aspect of the broader field of content-based analysis of music. There would be many useful applications of content-based analysis of musical audio, most of which are not yet fully realized. One of these is automatic music transcription, which involves the transformation of musical audio into a symbolic representation such as MIDI or a musical score, which in principle, could then be used to recreate the musical piece (e.g., Plumbley et al. 2002). Another application lies in the field of music informational retrieval, that is, simplifying interaction with large databases of musical multimedia by annotating audio data with information that is useful for search and retrieval (e.g., Martin et al. 1998).

Two other applications are structured audio and emotion detection in music. In the case of structured audio, we are interested in transmitting sound by describing rather than compressing it (Martin et al. 1998). Here, content analysis could be used to automate partly the creation of this description by the automatic extraction of various musical constructs from the audio. Regarding emotion detection, Hevner (1936) has carried out experiments that substantiated a hypothesis that music inherently carries emotional meaning. Huron (2000) has pointed out that, because the preeminent functions of music are social and psychological, emotion could serve as a very useful measure for the characterization of music in information retrieval systems. The influence of musical chords on listeners's emotion has been demonstrated by Sollberger et al. (2003).

Whereas we would expect human listeners to be reasonably successful at general auditory scene analysis, it is still a challenge for computers to perform such tasks. Even simple human acts of cognition such as tapping the foot to the beat or swaying in time with the music are not easily reproduced by a computer program. A brief review of audio analysis as it relates to music, followed by case studies of a recently developed system that analyze specific aspects of music, has been presented by Dixon (2004). The landscape of music-content processing technologies is discussed in Aigrain (1999). The current article does not present new audio signal-processing techniques for content analysis, instead building a framework from existing techniques. However, it does represent a unique attempt at integrating harmonic and metric information within a unified system in a mutually informing manner.

Although the detection of individual notes constitutes low-level music analysis, it is often difficult for the average listener to identify them in music. Rather, it is the overall quality conveyed by the combination of notes to form chords. Chords are the harmonic description of music, and like melody and rhythm, could serve to capture the essence of the musical piece. Non-expert listeners tend to hear groups of simultaneous notes as chords. It can be quite difficult to identify whether or not a particular pitch has been heard in a chord. Furthermore, although a complete and accurate polyphonic transcription of all notes would undoubtedly yield the best results, it is often possible to classify music by genre, identify musical instruments by timbre, or segment music into sectional divisions without this low-level analysis.

Tonality is an important structural property of music, and it has been described by music theorists and psychologists as a hierarchical ordering of the pitches of the chromatic scale such that these notes are perceived in relation to one central and stable pitch, the tonic (Smith and Schmuckler 2000). This hierarchical structure is manifest in listeners's perceptions of the stability of pitches in tonal contexts. The key of a piece of music is specified by its tonic and one of two modes: major or minor. A system to determine the key of acoustic musical signals has [End Page 75] been demonstrated in Shenoy et al. (2004) and will be summarized later in this article.

Rhythm is...

pdf

Share