In lieu of an abstract, here is a brief excerpt of the content:

  • Algorithmic Clustering of Music Based on String Compression
  • Rudi Cilibrasi, Paul Vitányi, and Ronald de Wolf

All musical pieces are similar, but some are more similar than others. Apart from serving as an infinite source of discussion ("Haydn is just like Mozart—No, he's not!"), such similarities are also crucial for the design of efficient music information retrieval systems. The amount of digitized music available on the Internet has grown dramatically in recent years, both in the public domain and on commercial sites; Napster and its clones are prime examples. Web sites offering musical content in some form like MP3, MIDI, or other, need a way to organize their wealth of material; they need to somehow classify their files according to musical genres and subgenres, putting similar pieces together. The purpose of such organization is to enable users to navigate to pieces of music they already know and like, but also to give them advice and recommendations ("If you like this, you might also like . . ."). Currently, such organization is mostly done manually by humans, or based on patterns in the purchasing behaviors of customers. However, some recent research has been examining the possibilities of automating music classification.

A human expert, comparing different pieces of music with the goal of clustering similar works together, will generally look for certain specific similarities. Previous attempts to automate this process do the same. Generally speaking, they take a file containing a piece of music and extract from it various specific numerical features, related to pitch, rhythm, harmony, etc. One can extract such features using, for instance, Fourier transforms (Tzanetakis and Cook 2002) or wavelet transforms (Grimaldi, Kokaram, and Cunningham 2002). The feature vectors corresponding to the various files are then classified or clustered using existing classification software, based on various standard statistical pattern recognition classifiers (Tzanetakis and Cook 2002), Bayesian classifiers (Dannenberg, Thom, and Watson 1997), hidden Markov models (Chai and Vercoe 2001), ensembles of nearest-neighbor classifiers (Grimaldi, Kokaram, and Cunningham 2002), or neural networks (Dannenberg, Thom, and Watson 1997; see also a paper by Paul Scott available online at www.stanford.edu/class/ee373a/musicclassification.pdf).

For example, one feature to look for would be tempo in the sense of beats per minute. One can make a histogram in which each histogram bin corresponds to a particular tempo in beats per minute, and the associated peak shows how frequent and strong that particular periodicity was over the entire piece. In Tzanetakis and Cook (2002), we see a gradual change from a few high peaks to many low and spread-out ones proceeding from hip-hop, rock, and jazz to classical. One can use this similarity type to try to cluster pieces in these categories. However, such a method requires specific and detailed knowledge of the problem area, because one needs to know the features for which to look.

Our aim is much more general. We do not look for similarity in specific features known to be relevant for classifying music; instead we apply a general mathematical theory of similarity. The aim is to capture, in a single similarity metric, every effective metric: effective versions of Hamming distance, Euclidean distance, edit distances (Orpen and Huron 1992), Lempel-Ziv distance (Cormode et al. 2000), and so on. Such a metric would be able to simultaneously detect all similarities between pieces that other effective metrics can detect. Rather surprisingly, such a "universal" metric indeed exists. It was developed in Li et al. (2001), Li [End Page 49] and Vitányi (2002), and Li et al. (2003), based on the "information distance" of Li and Vitányi (1997) and Bennett et al. (1998). Roughly speaking, two objects are deemed close if we can significantly "compress" one given the information in the other, the idea being that if two pieces are more similar, then we can more succinctly describe one given the other. Here compression is based on the ideal mathematical notion of Kolmogorov complexity, the length of the shortest compressed code from which the original object can be losslessly reproduced by an effective decompressor.

Unfortunately, this limit to what current and future compressors can do comes at the price of not being computable, but...

pdf

Share