In lieu of an abstract, here is a brief excerpt of the content:

Reviewed by:
  • Signal Processing Methods for Music Transcription
  • George Tzanetakis
Anssi Klapuri, Manuel Davy, Eds: Signal Processing Methods for Music Transcription Hardcover, 2006, ISBN-13 978-0-387-30667-4, 440 pages, US$ 139, illustrated, references, index; available from Springer, 233 Spring Street, New York, New York 10013, USA; telephone (+1) 212-460-1500 or (+1) 800-SPRINGER; fax (+1) 212-460-1575; electronic mailservice-ny@springer.com; Web www.springer.com/engineering/signals/book/978-0-387-30667-4.

The automatic music transcription (AMT) of music signals in audio format remains one of the biggest challenges of computer music analysis and information retrieval. During the last ten years, assisted by the enormous advances in computer processing speed, the interest in AMT has increased rapidly. This book is a timely addition to the literature on the topic and contains descriptions of state-of-art algorithms and systems in that area. AMT is a challenging, multi-faceted interdisciplinary problem with many subtasks that are covered in the book. Publications related to AMT appear in a variety of different conferences and journals making it difficult to track the progress in the field. This challenge has to a large degree been addressed by the publication of this book with its comprehensive bibliography of almost 700 entries and extensive index. Hopefully such a great resource will stimulate more research in this exciting area.

The editors have done a good job of assembling chapters from leading experts in each subtask and organizing the book into a coherent whole. As is frequently the case with edited collections of chapters the book is not as well integrated as single author textbooks usually are. Therefore, the text may be more appropriate for researchers or graduate students familiar with the field than for newcomers.

There are four parts, each consisting of three chapters. Chapters 1 through 3 (Part I) define terminology and lay the foundations for understanding AMT algorithms and systems. Chapters 4 through 6 (Part II) deal with rhythm and timbre analysis, and Chapters 7 through 9 (Part III) with multiple fundamental frequency analysis. Parts II and III cover the majority of approaches, algorithms, and concepts needed to build AMT systems. The last part (IV) of the book describes three existing examples of such systems.

The first chapter provides a comprehensive well-written introduction to the problem of music transcription [End Page 86] and different approaches to solving it. The chapter also provides a compact summary of all the topics covered in the book and could serve as a quick but thorough introduction to the field for someone who doesn’t have the time to read the entire book.

The goal of Chapter 2 is to provide an introduction to the signal processing, statistics, and machine learning techniques that have been applied to music transcription. This is followed by Chapter 3, which describes sparse adaptive representations of audio signals. Both of these chapters manage to provide a comprehensive overview of the majority of the techniques used in automatic music transcription systems. There is some unbalance, with certain topics described in more or less detail than necessary, but this is not a serious problem. Another criticism is that the descriptions are relatively dry and technical. This makes them more suitable for researchers familiar with the topics who need a quick overview rather than readers who are encountering them for the first time. In addition, it would be nice to have more explicit connections made by the authors about how these techniques are used in the subsequent chapters. More generally, I would have liked to see more links and connections established both ways between part I and the other three parts of the book. I would also have liked a different structure with one chapter devoted to audio representations (merging section 2.1 of Chapter 2 with Chapter 3) followed by a chapter on statistics, estimation, and machine learning. Finally, I feel that a chapter providing basic information about perceptually informed approaches covering topics such as auditory filter banks, masking, gestalt grouping cues, and computational auditory scene analysis would make a valuable addition to Part I. Although these topics are covered in subsequent chapters I feel that distilling their common...

pdf

Share