- Zipf's Law, Music Classification, and Aesthetics
The connection between aesthetics and numbers dates back to pre-Socratic times. Pythagoras, Plato, and Aristotle worked on quantitative expressions of proportion and beauty such as the golden ratio. Pythagoreans, for instance, quantified "harmonious" musical intervals in terms of proportions (ratios) of the first few whole numbers: a unison is 1:1, octave is 2:1, perfect fifth is 3:2, perfect fourth is 4:3, and so on (Miranda 2001, p. 6). The Pythagorean scale was refined over centuries to produce well-tempered and equal-tempered scales (Livio 2002, pp. 29, 186).
Galen, summarizing Polyclitus, wrote, "Beauty does not consist in the elements, but in the harmonious proportion of the parts." Vitruvius stated, "Proportion consists in taking a fixed nodule, in each case, both for the parts of a building and for the whole." He then defined proportion as "the appropriate harmony arising out of the details of the work itself; the correspondence of each given detail among the separate details to the form of the design as a whole." This school of thought crystallized into a universal theory of aesthetics based on "unity in variety" (Eco 1986, p. 29).
Some musicologists dissect the aesthetic experience in terms of separable, discrete sounds. Others attempt to group stimuli into patterns and study their hierarchical organization and proportions (May 1996; Nettheim 1997). Leonard Meyer states that emotional states in music (sad, angry, happy, etc.) are delineated by statistical parameters such as dynamic level, register, speed, and continuity (2001, p. 342).
Building on earlier work by Vilfredo Pareto, Alfred Lotka, and Frank Benford (among others), George Kingsley Zipf refined a statistical technique known as Zipf's Law for capturing the scaling properties of human and natural phenomena (Zipf 1949; Mandelbrot 1977, pp. 344-345).
We present results from a study applying Zipf's Law to music. We have created a large set of metrics based on Zipf's Law that measure the proportion or distribution of various parameters in music, such as pitch, duration, melodic intervals, and harmonic consonance. We applied these metrics to a large corpus of MIDI-encoded pieces. We used the generated data to perform statistical analyses and train artificial neural networks (ANNs) to perform various classification tasks. These tasks include author attribution, style identification, and "pleasantness" prediction. Results from the author attribution and [End Page 55] style identification ANN experiments have appeared in Machado et al. (2003, 2004) and Manaris et al. (2003), and these results are summarized in this article. Results from the "pleasantness" prediction ANN experiment are new and therefore discussed in detail. Collectively, these results suggest that metrics based on Zipf's Law may capture essential aspects of proportion in music as it relates to music aesthetics.
Zipf's Law reflects the scaling properties of many phenomena in human ecology, including natural language and music (Zipf 1949; Voss and Clarke 1975). Informally, it describes phenomena where small events are quite frequent and large events are rare. Once a phenomenon has been selected for study, we can examine the contribution of each event to the whole and rank it according to its "importance" or "prevalence" (see linkage.rockefeller .edu/wli/zipf). For example, we may rank unique words in a book by their frequency of occurrence, visits to a Web site by how many of them originated from the same Internet address, and so on.
In its most succinct form, Zipf's Law is expressed in terms of the frequency of occurrence (i.e., count or quantity) of events, as follows:
where F is the frequency of occurrence of an event within a phenomenon, r is its statistical rank (position in an ordered list), and a is close to 1. In the book example above, the most frequent word would be rank 1, the second most frequent word would be rank 2, and so on. This means that the frequency of occurrence of a word is inversely proportional to its rank. For example, if the first ranked word appears 6,000 times, the second ranked word would appear approximately 3...