- Assessing Optical Music Recognition Tools
As digitization and information technologies advance, document analysis and optical-character-recognition technologies have become more widely used. Optical Music Recognition (OMR), also commonly known as OCR (Optical Character Recognition) for Music, was first attempted in the 1960s (Pruslin 1966). Standard OCR techniques cannot be used in music-score recognition, because music notation has a two-dimensional structure. In a staff, the horizontal position denotes different durations of notes, and the vertical position defines the height of the note (Roth 1994). Models for nonmusical OCR assessment have been proposed and largely used (Kanai et al. 1995; Ventzislav 2003).
An ideal system that could reliably read and "understand" music notation could be used in music production for educational and entertainment applications. OMR is typically used today to accelerate the conversion from image music sheets into a symbolic music representation that can be manipulated, thus creating new and revised music editions. Other applications use OMR systems for educational purposes (e.g., IMUTUS; see www.exodus.gr/imutus), generating customized versions of music exercises. A different use involves the extraction of symbolic music representations to be used as incipits or as descriptors in music databases and related retrieval systems (Byrd 2001).
OMR systems can be classified on the basis of the granularity chosen to recognize the music score's symbols. The architecture of an OMR system is tightly related to the methods used for symbol extraction, segmentation, and recognition. Generally, the music-notation recognition process can be divided into four main phases: (1) the segmentation of the score image to detect and extract symbols; (2) the recognition of symbols; (3) the reconstruction of music information; and (4) the construction of the symbolic music notation model to represent the information (Bellini, Bruno, and Nesi 2004). Music notation may present very complex constructs and several styles. This problem has been recently addressed by the MUSICNETWORK and Motion Picture Experts Group (MPEG) in their work on Symbolic Music Representation (www.interactivemusicnetwork.org/mpeg-ahg). Many music-notation symbols exist, and they can be combined in different ways to realize several complex configurations, often without using well-defined formatting rules (Ross 1970; Heussenstamm 1987). Despite various research systems for OMR (e.g., Prerau 1970; Tojo and Aoyama 1982; Rumelhart, Hinton, and McClelland 1986; Fujinaga 1988, 1996; Carter 1989, 1994; Kato and Inokuchi 1990; Kobayakawa 1993; Selfridge-Field 1993; Ng and Boyle 1994, 1996; Coüasnon and Camillerapp 1995; Bainbridge and Bell 1996, 2003; Modayur 1996; Cooper, Ng, and Boyle 1997; Bellini and Nesi 2001; McPherson 2002; Bruno 2003; Byrd 2006) as well as commercially available products, optical music recognition—and more generally speaking, music recognition—is a research field affected by many open problems. The meaning of "music recognition" changes depending on the kind of applications and goals (Blostein and Carter 1992): audio generation from a musical score, music indexing and searching in a library database, music analysis, automatic transcription of a music score into parts, transcoding a score into interchange data formats, etc. For such applications, we must employ common tools to provide answers to questions such as "What does a particular percentage-recognition rate that is claimed by this particular algorithm really mean?" and "May I invoke a common methodology to compare different OMR tools on the basis of my music?" As mentioned in Blostein and Carter (1992) and Miyao and Haralick (2000), there is no standard for expressing the results of the OMR process. [End Page 68]
Another difficulty has also to do with the lack of a standard language for symbolic music notation representation. Many music languages are available (Selfridge-Field 1997; Bellini, Bruno, and Nesi 2001; Good 2001), but none of them is satisfactory for a music-recognition evaluation. The same symbols can be modeled in different ways in different languages, and sometimes the same symbols may have multiple representations in the same language. This makes the comparison among OMR results much more difficult at the level of music-notation symbolic coding, and thus also in terms of produced representation. At present, there is neither a standard database for music-score recognition nor a standard terminology. If a new recognition...