A Fuzzy-Logic Mapper for Audiovisual Media

Rodrigo F. Cadiz

In lieu of an abstract, here is a brief excerpt of the content:

A Fuzzy-Logic Mapper for Audiovisual Media
Rodrigo F. Cádiz

Recent technological developments have enabled us to synthesize images and sounds concurrently within single computers, even in real time, giving birth to novel and genuinely integrated audiovisual art forms (Hunt et al. 1998). But how should we organize and compose such works? Given a certain soundscape, what would form an appropriate sequence of images to that soundscape? Given a certain sequence of images, what soundscape is appropriate to it? If the image sequence and the soundscape are being created concurrently, how should we compose them?

Authors have proposed different approaches to these questions (Whitney 1980; Hunt et al. 1998; Lokki et al. 1998; Rudi 1998; Kim and Lipscomb 2003; Gerhard and Hepting 2004; Yeo et al. 2004). These approaches differ significantly, and they are based on diverse principles, such as correspondence of aural to visual harmony, audiovisual modeling of mathematical principles, audiovisual rendering, data sonification, algorithmic control, and parameter space exploration. It is important to notice that there is no easy or correct solution, because the problem we must deal with lies in combining two entirely different media in time (Hunt et al. 1998).

A fuzzy-logic approach to the challenge of composing both sound and moving image within a coherent framework is proposed here as an alternative solution. This approach is based on a fuzzy-logic model that enables a flexible mapping of either aural or visual information onto the other, and it is able to generate complex audiovisual relationships by very simple means. This mapping strategy is inspired by two fundamental ideas: isomorphism and synaesthesia. Isomorphism applies when two complex structures can be mapped onto each other based on the fact that changes in one modality consistently cause changes in another modality (Hofstadter 1999). The word "synaesthesia" comes directly from the Greek syn ("together") and aísthesis ("perceive"; Van Campen 1999), thus meaning "a union of the senses." Synaesthesia occurs when stimulation in one sensory modality automatically triggers a perception in a second modality in the absence of any direct stimulation to this second modality (Harrison and Baron-Cohen 1997).

This article is structured as follows. First, the motivations for this work are presented and discussed, including discussions of audiovisual domains, synaesthesia, and isomorphisms. Second, fuzzy logic is introduced, including its main features. Third, details of the proposed fuzzy-logic mapper are presented. Fourth, ID-FUSIONES (2001) and TIME EXPOSURE (2005), computer music–video works that use the proposed model, are discussed as actual implementations of the approach described in this article. Finally, conclusions and directions for future work are addressed.

Motivation

To understand how the composition of an audiovisual work should be addressed, it is important to consider how presenting music with visuals affects the listener differently from presenting music or visuals alone.

Experiencing the Audiovisual

There is substantial empirical evidence to support the common subjective experience that music and moving images interact in powerful and effective ways (Bullerjahn and Guldenring 1994; Iwamiya 1994; Lipscomb and Kendall 1994; Rosar 1994; Sirius and Clarke 1994). However, as Finaas (2001) suggests, it is often difficult to predict the exact influence of the visual stimuli that relate to audio stimuli. The visual elements and their relationship to the music can vary tremendously. Hunt et al. (1998) suggest that combining music and visuals produces combinatorial relationships of such complexity [End Page 67] that it forces composers to develop extensive algorithmic control to maintain stylistic consistency. Sirius and Clarke (1994) composed music and used computer-generated moving images to investigate the interaction of different visual and musical parameters. Their findings show that the effects of music in the rating of visual images are usually additive and that there are no interactions among specific musical styles and particular visual images. In other words, no specific audiovisual combinations acquire particular semantic characteristics.

Finaas (2001) addressed the question of whether presenting music live, audiovisually, or only aurally makes any difference for listeners' experiences. Audiovisual presentations were categorized in three different submodes: simple documentary, which is just a video recording of the live performance; TV-type, in which live performance is alternated with images from various perspectives, close-ups, and images of details; and non...

Computer Music Journal