In lieu of an abstract, here is a brief excerpt of the content:

  • Exploring Music Collections by Browsing Different Views
  • Elias Pampalk, Simon Dixon, and Gerhard Widmer

Technological advances with respect to Internet bandwidth and storage media have made large music collections prevalent. Exploration of such collections is usually either limited to listings returned from, for example, artist-based queries, or it requires additional information not readily available to the public, such as customer profiles from electronic-music distributors. In particular, content-based browsing of music according to overall sound similarity has remained an unsolved problem, although recent work seems very promising (e.g., Tzanetakis and Cook 2001; Aucouturier and Pachet 2002b; Cano et al. 2002; Pampalk et al. 2002a). The main difficulty lies in estimating perceived similarity given solely an audio signal.

Music similarity as such might appear to be a rather simple concept. For example, it is easy to distinguish classical music from heavy metal. However, there are several aspects of similarity to consider. Some aspects have a very high level of detail, such as the difference between Vladimir Horowitz's and Daniel Barenboim's interpretation of a Mozart piano sonata. Other aspects are more apparent, such as the noise level. It is questionable whether it will ever be possible to automatically analyze all aspects of similarity directly from audio. But within limits, it is possible to analyze similarity in terms of, for example, rhythm (Foote et al. 2002; Paulus and Klapuri 2002; Dixon et al. 2003) or timbre (Logan and Salomon 2001; Aucouturier and Pachet 2002b).

In this article, we present a new approach to combining information extracted from audio with meta-information such as artist or genre. In particular, we extract spectrum and periodicity histograms to roughly describe timbre and rhythm, respectively. For each of these aspects of similarity, the collection is organized using a self-organizing map (SOM; Kohonen 1982, 2001). The SOM arranges the pieces of music on a map such that similar pieces are located near each other. We use smoothed data histograms to visualize the cluster structure and to create an "islands of music" metaphor where groups of similar pieces are visualized as islands (Pampalk et al. 2002a).

Furthermore, we integrate a third type of organization that is not derived from audio analysis. This could be based on meta-data such as artist or genre information, or it could be any arbitrary user-defined organization. We align these three different views and interpolate between them using Aligned SOMs (Pampalk et al. 2003b). The user is able to browse the collection and interactively explore different aspects by gradually changing focus from one view to another. This is similar to the idea presented by Aucouturier and Pachet (2002b) who use an "Aha-Slider" to control the combination of meta-information with information derived from audio analysis. We demonstrate our approach on a small music collection.

In this article, we first present the spectrum and periodicity histograms used to calculate similarities from the respective viewpoints. This is followed by a review of the SOM and Aligned SOMs. Finally, we demonstrate our approach and discuss various shortcomings and more recent work.

Similarity Measures

In general, it is not predictable when a human listener will consider pieces to be similar. Pieces might be deemed similar depending on the lyrics, instrumentation, melody, rhythm, artists, or [End Page 49] vaguely by the emotions they invoke. However, even relatively simple similarity measures can aid in handling large music collections more efficiently. For example, Logan (2002) uses a spectrum-based similarity measure to automatically create playlists of similar pieces. Aucouturier and Pachet (2002b) use a similar spectrum-based measure to find unexpected similarities, e.g., similarities between pieces from different genres. A rather different approach based on the psychoacoustic model of fluctuation strength was presented by Pampalk et al. (2002a) to organize and visualize music collections.

Unlike previous approaches, we do not try to model the overall perceived similarity, but rather we focus on different aspects and allow the user to interactively decide which combination of these aspects is the most interesting. Specifically, we define two similarity measures, one based on rhythmic aspects (periodicity histograms), the other on timbre (spectrum histograms). To explain these, we first review the psychoacoustic preprocessing we apply.

Psychoacoustic Preprocessing...

pdf

Share