In lieu of an abstract, here is a brief excerpt of the content:

  • Effects of Album and Artist Filters in Audio Similarity Computed for Very Large Music Databases
  • Arthur Flexer and Dominik Schnitzer

In music information retrieval, one of the central goals is to automatically recommend music to users based on a query song or query artist. This can be done using expert knowledge (e.g., www.pandora.com), social meta-data (e.g., www.last.fm), collaborative filtering (e.g., www.amazon.com/mp3), or by extracting information directly from the audio (e.g., www.muffin.com). In audio-based music recommendation, a well-known effect is the dominance of songs from the same artist as the query song in recommendation lists.

This effect has been studied mainly in the context of genre-classification experiments. Because no ground truth with respect to music similarity usually exists, genre classification is widely used for evaluation of music similarity. Each song is labelled as belonging to a music genre using, e.g., advice of a music expert. High genre classification results indicate good similarity measures. If, in genre classification experiments, songs from the same artist are allowed in both training and test sets, this can lead to over-optimistic results since usually all songs from an artist have the same genre label. It can be argued that in such a scenario one is doing artist classification rather than genre classification. One could even speculate that the specific sound of an album (mastering and production effects) is being classified. In Pampalk, Flexer, and Widmer (2005) the use of a so-called “artist filter” that ensures that a given artist’s songs are either all in the training set, or all in the test set, is proposed. Those authors found that the use of such an artist filter can lower the classification results quite considerably (as much as from 71 percent down to 27 percent, for one of their music collections). These over-optimistic accuracy results due to not using an artist filter have been confirmed in other studies (Flexer 2006; Pampalk 2006). Other results suggest that the use of an artist filter not only lowers genre classification accuracy but may also erode the differences in accuracies between different techniques (Flexer 2007).

All these results were achieved on rather small databases (from 700 to 15,000 songs). Often whole albums from an artist were part of the database, perhaps even more than one. These specifics of the databases are often unclear and not properly documented. The present article extends these results by analyzing a very large data set (over 250,000 songs) containing multiple albums from individual artists. We try to answer the following questions:

  1. 1. Is there an album and artist effect even in very large databases?

  2. 2. Is the album effect larger than the artist effect?

  3. 3. What is the influence of database size on music recommendation and classification?

As will be seen, we find that the artist effect does exist in very large databases, and the album effect is bigger than the artist effect.

Data

For our experiments we used a data set D(ALL) of S0 = 254,398 song excerpts (30 seconds each) from a popular Web store selling music. The freely [End Page 20] available preview song excerpts were obtained with an automated Web-crawl. All meta-information (artist name, album title, song title, genres) is parsed automatically from the HTML code. The excerpts are fromU = 18,386 albums from A = 1,700 artists. From the 280 existing different hierarchical genres, only the G = 22 general ones on top of the hierarchy are being kept for further analysis (e.g., “Pop/General” is kept but not “Pop/Vocal Pop”). The names of the genres plus percentages of songs belonging to each of the genres are given in Table 1. (Each song is allowed to belong to more than one genre, hence the percentages in Table 1 add up to more than 100 percent.) The genre information is identical for all songs on an album. The numbers of genre labels per album are given in Figure 1. Our database was set up so that every artist contributes between 6 and 29 albums (see Figure 2).


Click for larger view
View full resolution...

pdf

Share