International Conference on Music Information Retrieval 2003 (review)

Alexandra L. Uitdenbogerd

In lieu of an abstract, here is a brief excerpt of the content:

Reviewed by:

International Conference on Music Information Retrieval 2003
Alexandra L. Uitdenbogerd

International Conference on Music Information Retrieval 2003 Johns Hopkins University/Library of Congress, Baltimore, Maryland, USA, 26–30 October, 2003

Introduction

ISMIR 2003, the fourth International Conference on Music Information Retrieval, was held 26–30 October, 2003, in Baltimore, Maryland, USA. Since its inception in 2000, the conference has been a popular venue for technical papers on analyzing, storing, and retrieving music, whether it be in the form of MIDI files, audio recordings, or sheet music. The main applications of interest are content-based retrieval systems, music recommenders, classifiers, and transcribers. Other participants are interested in the use of metadata and the implementation of systems that allow on-line access to collections.

This year's conference was sponsored by the Sheridan Libraries of Johns Hopkins University as well as the Library of Congress (LOC). Thus participants were able to experience an interesting tour of the LOC, and to hear works from the rare collection of 19th-century popular sheet music from the Lester S. Levy collection held by the Sheridan Libraries. In addition, a specially prepared concert at the Peabody Institute demonstrated a range of relatively accessible computer music, including the audiovisually entertaining 7 Cartoons by Maurice Wright and the beautiful Narcissus for solo flute by Thea Musgrave, sensitively performed by Peabody graduate Chia-Jui Lee. Other pieces stretched the capabilities of conventional instruments such as the piano, double bass, and trombone.

There were 23 peer-reviewed papers and 25 posters presented at this conference, with additional invited sessions and panels. Invited speaker Avery Wang amazed attendees with demonstrations of the Shazam audio search engine. Shazam's index of local temporal features successfully identified recordings from very noisy environments. For the second year running, ISMIR included a successful tutorial program, including a return of the popular session on audio retrieval techniques by George Tzanetakis.

In this review I discuss the results presented grouped by type of application, firstly addressing usability, then symbolic and audio-based retrieval work, followed by digital library issues, and progress in evaluation of Music Information Retrieval (MIR) systems.

User Issues

This year saw further research into user issues of content-based music retrieval, particularly the ability of users to construct queries, whether through singing or using a text-based representation. Through Roger B. Dannenberg et al.'s work we discovered that only half of users' sung queries resembled the target piece of music, many being jumbled up fragments of the original piece. Steffen Pauws showed us that absolute pitch is unlikely to be used in sung queries, as only recently-heard songs repeated by trained singers were likely to be at the original pitch. Tempo and contour, however, remain the most reliable aspects of user singing performance. Eliciting a user's query in the form of a string of symbols was less successful (Alexandra L. Uitdenbogerd and Yaw-Wah Yap), with non-musicians having no [End Page 83] success at all in constructing contour or numeric representations of a simple melody.

In other work, researchers of the Musical Audio Mining (MAMI) project (Micheline Lesaffre et al.) showed that there were differences in user queries based on gender, musical experience, and age.

Applications

The main interest of many researchers at ISMIR is the technology required for building successful content-based music retrieval systems. The problem is approached in several ways, using different types of data. In the symbolic realm, researchers work with MIDI files or notation-based data-forms and develop matching algorithms, indexes, or front-ends for query-by-humming systems. In the audio domain there is now a broadening of goals. Retrieval and classification are based on features representing genre, mood, exact recording match, or whatever evidence can be gained for notes or tonality. Progress in the difficult task of transcription would bring the symbolic and audio techniques together, but this appears to be a sufficiently hard problem that more years of work are required before that goal can be reached.

Another issue of concern is the ability to compare the techniques developed by different researchers so that it is possible to determine what works best. I discuss these applications and issues in...

Computer Music Journal