- A Multi-Layered, Time Based Music Description Approach Based on XML
There are six aspects into which music information can be divided: general, structural, music logic, notation, performance, and audio. We call these aspects "layers," because each represents a different level of abstraction of the music information. However, these layers can be viewed as a single symbolic music information (SMI) entity. The purpose of SMI is to relate all existing representations in the notation, performance, and audio layers using the music logic and structural layers.
In this article, we present an Extensible Markup Language (XML) instance intended for the integration of the different aspects of music representation. It tries to bring together ideas and concepts developed in the past. Because this format is still under development by the IEEE-SA Working Group on Music Application of XML (IEEE-SA MAX WG; see www.lim.dico.unimi.it/IEEE/XML.html), and because in an article it is impossible to give a detailed description of a whole format, we present only the main concepts that most probably will be affected only by minor changes by future developments of the format itself. These concepts are Layered Symbolic Music Information (Layered SMI) and the Spine structure. Owing to these two concepts, we consider the contribution of our format to be a more complete integration of previous concepts and formats in a framework usable by diverse music applications, especially those that are based on different concurrent music layers (for example, the automatic synchronization of audio, MIDI, and score).
Layered SMI is important owing to the manifold nature of music representation. Downie (2003) explains this concept in a very meaningful way. According to Downie, music is composed of seven facets: pitch, temporal, harmonic, timbral, editorial, textual, and bibliographic. Moreover, each facet can interact with the others, increasing the complexity of the representational challenge. In addition, music representation also poses multi-representational, multi-cultural, multi-experiential, and multi-disciplinary challenges.
Among the several existing formats for music representation, there are only a small number of them that can be said to be a de facto standard. If we compare these few music formats, we can observe that each of them is designed to represent mainly a particular aspect or only a limited number of aspects of music information. We can subdivide these formats in four big clusters: audio, sub-symbolic, notational, and compositional, similar to the domains of Standard Music Description Language (SMDL; Sloan 1993; see also ftp.ornl.gov/pub/sgml/WG8/SMDL/ 10743.ps).
Audio formats encode signal information, that is, only the purely aural aspect (the "Gestural" domain of SMDL). Sub-symbolic formats like MIDI (Musical Instrument Digital Interface; MIDI Manufacturers Association 2001) or Csound (Boulanger 1999) encode information about how to produce or reproduce music electronically (the "Logical" domain of SMDL). Many music notation file formats have been developed by the different music editing software producers. Some of them are rich enough, like NIFF (Notation Interchange File Format; see www.musique.umontreal.ca/personnel/Belkin/NIFF.doc.html) or Enigma, to generate a MIDI rendering of notational content (the "Visual" domain of SMDL). However, new software composition tools (e.g., Haus and Sametti 1991; Assayag et al. 1999) need to formalize and exchange information and structures that are not represented in these formats (the "Analytical" domain of SMDL). Moreover, the emerging possibility of wide dissemination of music via the Internet increases the urgency of dealing with the problems of cataloging and protecting intellectual rights of these items.
In the past, many researchers have addressed the problem of representing different aspects of music. We think that the most conceptually meaningful of these attempts was SMDL (Sloan 1993; see also ftp.ornl.gov/pub/sgml/WG8/SMDL/10743.ps). Although it is not explicit in many existing representations, there is an intrinsic space-time relationship [End Page 70] in music that can be seen as a bi-directional mapping function between the space and time domains (e.g., disposition of notes on the staff versus timing of notes in an audio file). This relationship is an underlying structure that holds the layers together like glue. We call it a...