A Tutorial on Spectral Sound Processing Using Max/MSP and Jitter

Jean-François Charles

In lieu of an abstract, here is a brief excerpt of the content:

A Tutorial on Spectral Sound Processing Using Max/MSP and Jitter
Jean-François Charles

For computer musicians, sound processing in the frequency domain is an important and widely used technique. Two particular frequency-domain tools of great importance for the composer are the phase vocoder and the sonogram. The phase vocoder, an analysis-resynthesis tool based on a sequence of overlapping short-time Fourier transforms, helps perform a variety of sound modifications, from time stretching to arbitrary control of energy distribution through frequency space. The sonogram, a graphical representation of a sound’s spectrum, offers composers more readable frequency information than a time-domain waveform.

Such tools make graphical sound synthesis convenient. A history of graphical sound synthesis is beyond the scope of this article, but a few important figures include Evgeny Murzin, Percy Grainger, and Iannis Xenakis. In 1938, Evgeny Murzin invented a system to generate sound from a visible image; the design, based on the photo-optic sound technique used in cinematography, was implemented as the ANS synthesizer in 1958 (Kreichi 1995). Percy Grainger was also a pioneer with the “Free Music Machine” that he designed and built with Burnett Cross in 1952 (Lewis 1991); the device was able to generate sound from a drawing of an evolving pitch and amplitude. In 1977, Iannis Xenakis and associates built on these ideas when they created the famous UPIC (Unité Polyagogique Informatique du CEMAMu; Marino, Serra, and Raczinski 1993).

In this article, I explore the domain of graphical spectral analysis and synthesis in real-time situations. The technology has evolved so that now, not only can the phase vocoder perform analysis and synthesis in real time, but composers have access to a new conceptual approach: spectrum modifications considered as graphical processing. Nevertheless, the underlying matrix representation is still intimidating to many musicians. Consequently, the musical potential of this technique is as yet unfulfilled.

This article is intended as both a presentation of the potential of manipulating spectral sound data as matrices and a tutorial for musicians who want to implement such effects in the Max/MSP/Jitter environment. Throughout the article, I consider spectral analysis and synthesis as realized by the Fast Fourier Transform (FFT) and Inverse-FFT algorithms. I assume a familiarity with the FFT (Roads 1995) and the phase vocoder (Dolson 1986). To make the most of the examples, a familiarity with the Max/MSP environment is necessary, and a basic knowledge of the Jitter extension may be helpful.

I begin with a survey of the software currently available for working in this domain. I then show some improvements to the traditional phase vocoder used in both real time and performance time. (Whereas real-time treatments are applied on a live sound stream, performance-time treatments are transformations of sound files that are generated during a performance.) Finally, I present extensions to the popular real-time spectral processing method known as the freeze, to demonstrate that matrix processing can be useful in the context of real-time effects.

Spectral Sound Processing with Graphical Interaction

Several dedicated software products enable graphic rendering and/or editing of sounds through their sonogram. They generally do not work in real time, because a few years ago, real-time processing of complete spectral data was not possible on computers accessible to individual musicians. This calculation limitation led to the development of objects like IRCAM’s Max/MSP external iana∼ , which reduces spectral data to a set of useful descriptors (Todoroff, Daubresse, and Fineberg 1995). After a quick survey of the current limitations of non-real-time software, we review the environments allowing FFT processing and visualization in real time. [End Page 87]

Non-Real-Time Tools

AudioSculpt, a program developed by IRCAM, is characterized by the high precision it offers as well as the possibility to customize advanced parameters for the FFT analysis (Bogaards, Röbel, and Rodet 2004). For instance, the user can adjust the analysis window size to a different value than the FFT size. Three automatic segmentation methods are provided and enable high-quality time stretching with transient preservation. Other important functions are frequency-bin independent dynamics processing (to be used for noise removal, for instance) and application of...

Computer Music Journal