- A Review of Automatic Rhythm Description Systems
Rhythm belongs with harmony, melody, and timbre as one of the most fundamental aspects of music. Sound by its very nature is temporal, and in its most generic sense, the word rhythm is used to refer to all of the temporal aspects of a musical work, whether represented in a score, measured from a performance, or existing only in the perception of the listener. To build a computer system capable of intelligently processing music, it is essential to design representation formats and processing algorithms for the rhythmic content of music.
Computer systems reported in the literature offer different interpretations of the phrase "automatic rhythm description," as they address diverse applications such as tempo induction, beat tracking, quantization of performed rhythms, meter induction, and characterization of intentional timing deviations. Although some rhythmic concepts are consensual, no single representation of rhythm has been devised that would be suitable for all applications. In this article, we propose a unifying framework for automatic rhythm description systems and review existing systems with respect to the functional units of the proposed framework.
Representing Musical Rhythm
A naïve approach to describe the rhythm of musical data (whether audio or symbolic) is to specify an exhaustive and accurate list of onset times, perhaps together with some other musical features characterizing those events (e.g., durations, pitches, and intensities in a MIDI representation). However, such a representation lacks abstraction. There is more to rhythm than the absolute timings of successive musical events. There seems to be agreement on the fact that, in addition, one must also take into account the metrical structure, tempo, and timing (Honing 2001). However, there is no consensus regarding explicit representations of these three rhythmic concepts.
A primary reason is that different rhythmic features are relevant at each step in the musical communication chain, at each step where rhythmic content is produced, transmitted, or received. As we illustrate in the next sections, metrical structure, tempo, and timing take slightly different meanings for composers, performers, and listeners. Indeed, even if a goal in the field of music psychology is to seek representational elements, or processes, that would stand as "universal" or "innate" (i.e., functioning from birth, independent of environmental influence; see Drake and Bertrand 2001), a more widespread objective is to determine differences in perception according to a listener's culture, musical background, age, or sex (Gabrielsson 1973; Drake 1993; Drake, Penel, and Bigand 2000; Lapidaki 2000).
A second reason for lack of consensus is that the diverse media used for rhythm transmission suffer a trade-off between the level of abstraction and the comprehensiveness of the representation. Standard Western music notation provides an accepted method for communicating a composition to a performer, but it holds little value in representing the interpretation of a work as played in a concert. On the other hand, a MIDI file might be able to represent important aspects of a performance, but it does not provide the same level of abstraction as the score. At the extreme end, an acoustic signal implicitly contains all rhythmic aspects but provides no abstraction whatsoever. In an application context, the choice of a suitable representation is based on the [End Page 34] levels of detail (or abstraction) of the various aspects of music that are provided by the representation.
Western music notation provides an objective regular temporal structure underlying musical event occurrences and organizing them into a hierarchical metrical structure. This is independent of the hierarchical phrase structure that may be explicit in the notation or implicit in the composer's, performer's, or listener's conceptualization of the music.
The Generative Theory of Tonal Music (GTTM; Lerdahl and Jackendoff 1983) formalizes this distinction by defining rules for a "musical grammar" that deals separately with grouping structure (phrasing) and metrical structure. Whereas the grouping structure deals with time spans (durations), the metrical structure deals with durationless points in time—beats—that obey the following rules. Beats must be equally spaced. A division according to a specific duration corresponds to a metrical level. Several levels coexist, from low levels (small time divisions) to high levels (longer time divisions). There...