In lieu of an abstract, here is a brief excerpt of the content:

Reviewed by:
  • An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics by Alexander Lerch
  • Bob L. Sturm
Alexander Lerch : An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics. Hardcover, 2012, ISBN 978-1-118-26682-3;, 272 pages, introduction, index, appendices, references $125.00; Wiley-IEEE Press, 10475 Crosspoint Boulevard, Indianapolis, Indiana 46256, USA; telephone (877) 762-2974; fax (800) 597-3299; http://support.wiley.com/. MATLAB code available from: http://www.audiocontentanalysis.org/.

This book aims to provide a unified and approachable introductory text for a graduate course in analyzing audio music signals. Indeed, it is a result of the author's experience in teaching such a course at the Technical University of Berlin, where the only prerequisite is a basic knowledge of digital signal processing. Along with four appendices, it has ten chapters, including an introduction and a review of fundamental concepts in digital signal processing (e.g., sampling), analysis (e.g., Fourier transform), probability theory (e.g., density), and perception (e.g., auditory filterbanks). The main portion of the book, chapters 3-6, present dozens of different features that have been used in analyzing music audio signals: zero-crossing rate, spectral shape, Mel-frequency cepstral coefficients, and linear prediction coefficients (chapter 3); envelopes and energy features (chapter 4); frequency, pitch, and chroma features (chapter 5); and onsets, tempo, and beat histogram features (chapter 6). The remaining four chapters cover particular applications of audio analysis: dynamic time warping for signal comparison (chapter 7); music similarity and instrument recognition (chapter 8); fingerprinting (chapter 9); and the analysis of musical performance (chapter 10). There are also four appendices covering properties of convolution, the Fourier transform and windowing, principal component analysis, and a review of software for audio analysis.

I am extremely interested in this text because my department will soon offer such a course. Hence, I attempted to read each chapter with two frames of mind: as a lecturer and as a student. My search for a suitable text, however, will continue. For the lecturer, this text is confusing, and has so many errors and oversimplifications that too much class time would be needed to address them. For the student, the text is very expensive, confusing, and filled with forward references that will frustrate them. Although including MATLAB code with a textbook is an excellent idea, and encourages reproducible research, the code is not immediately understandable, and none of it readily reproduces any figure in the text. In the following, I provide a few examples of these criticisms.

Chapter 2, "Fundamentals," presents quantization in Section 2.1.3.2, gives the calculation of the expected quantization error of uniform quantization, and shows its dependence on the probability distribution function (PDF) of the quantized signal. A student, having no experience with probability theory, will be utterly baffled here until arriving at Section 2.1.4.1 where the PDF is described. However, here the lecturer reads, "The abscissa of a PDF plot represents all possible amplitude values of the signal x and their probability is plotted on the ordinate." Figure 2.5 shows that for a sine wave (which I assume means the sine of a random variable uniformly distributed in [0,2\pi]) the PDF exceeds 1. [End Page 90]

Given the book's description of the y-value being a probability, this violates an axiom of probability theory: Probability is a scalar in [0, 1]. Like other examples, Section 2.2.1.2 discusses zero phase filtering, but without providing context because there is no discussion of phase delay or group delay with respect to filtering. Section 2.2.3 introduces the discrete Fourier transform, and its short-term implementation. Although I like the discussion about the practicalities of implementation, the student will find no discussion of what negative frequencies are, the effect of window shapes and sizes, the use of zero padding, frequency response, the significance of magnitude and phase, invertibility, and so on.

The main contribution of the book is its collection in one source of the many features available for signal analysis. Most of these features, however, are presented without any reference to music audio content. They are...

pdf

Share