In lieu of an abstract, here is a brief excerpt of the content:

  • Complexity-Scalable Beat Detection with MP3 Audio Bitstreams
  • Jia Zhu and Ye Wang

With the growing popularity of the MPEG-1 Audio Layer 3 (MP3) format, handheld devices such as personal digital assistants (PDAs) and mobile phones have become important entertainment platforms. Unlike conventional audio equipment, mobile devices are characterized by limited processing power, battery life, and memory, as well as other constraints. Therefore, music-processing tasks like beat detection must be implemented using low-complexity algorithms to cope with the constraints of these mobile devices.

This article presents a scheme of complexity-scalable beat detection of popular ("pop") music recordings that can run on different platforms, particularly battery-powered handheld devices. We show a user-friendly and platform-adaptive scheme such that the detector complexity can be adjusted to match the constraints of the device and user requirements. The proposed algorithm provides both theoretical and practical contributions, because we use the number of Huffman bits from the compressed bitstream without requiring any decoding as the sole feature for onset detection. Furthermore, we provide an efficient and robust graph-based beat-induction algorithm. By applying the beat detector to compressed rather than uncompressed audio, the system execution time can be reduced by almost three orders of magnitude.

We have implemented and tested the algorithm on a PDA platform. Experimental results show that our beat detector offers significant advantages over other existing methods in execution time while maintaining satisfactory detection accuracy.

Motivation

After a decade of explosive growth, mobile devices today have become important entertainment platforms alongside desktop computers and servers. Many applications such as games have been moved to handheld devices, where soundtrack tempo plays a key role in controlling relevant game parameters, such as the speed of the game (Holm et al. 2005). For content-based audio / video synchronization (Denman et al. 2005), the musical beat is the primary information source used as the anchor for timing. The beat of a piece of pop music is defined as the sequence of almost equally spaced phenomenal impulses. The beat is the simplest yet fundamental semantic information we perceive when listening to pop music. Groupings and strong / weak relationships form the rhythm and meter of the music (Scheirer 1998).

The beat-tracking process typically organizes musical audio signals into a hierarchical beat structure of three levels: quarter note, half note, and measure (Goto 2001), as shown in Figure 1. Beats at the quarter-note level correspond to periodic "beats" or "pulses" at a simple level, and those at the half-note level and the measure level correspond to the overall "rhythm," which is associated with grouping, hierarchy, and a strong / weak dichotomy. Pop-music beat detection is a subset of the beat-detection problem, which has been solved with detection accuracy as the primary if not the sole objective. In this article, we focus on beat detection in recorded audio rather than real-time beat tracking.

Currently, most beat-detection methods are implemented on a personal computer or server. Based on our experiments, we find that it is difficult to scale down the complexity of existing methods to run on portable platforms such as PDAs and mobile phones, where processing power, memory, and battery life become critical constraints. Although some recent results show that beat tracking can be implemented in a mobile phone after major optimizations (Seppanen et al. 2006), running such a complex algorithm taxes battery life, which is not desirable. Because software applications running on battery-powered portable platforms are gaining popularity, algorithms for content processing such as beat detection must be designed to match both the constraints of the device resources and the users' expectations. [End Page 71]


Click for larger view
View full resolution
Figure 1.

Hierarchical beat structure. (The 4 / 4 time signature prevalent in popular music is assumed.)

To identify users' requirements, we conducted surveys of students from schools and universities; these students constitute an important segment of the mobile-entertainment market. Our initial survey results indicate that system-execution time, detection accuracy, and battery life are critical performance criteria for mobile-device users. This im-plies that existing methods, which generally focus on detection accuracy at the cost of computational complexity, are apparently unable...

pdf

Share