In lieu of an abstract, here is a brief excerpt of the content:

  • Letters
  • Leonardo Gabrielli

Once Again

Once again in the history of music technology, groundbreaking innovations come as the result of research and development carried out by big names in the field of information and communications technologies (ICT). In the past, however, it often came as a side effect: Early digital computers were not developed in order to play the "Colonel Bogey March." [Editor's note: The writer is alluding to a piece played by the CSIRAC computer in 1951.] Now, however, the big names may start seeing music creation as one of their first goals. This may be a sign of a changing society in which engineering no longer tends primarily to "hard" goals (vehicles, energy, communication, weapons, etc.) but also to volatile matters such as music and the arts. This argument alone would be enough to open a social, historical, and ethical debate, which is not in the scope of my letter. I would rather go into some aspects of these innovations I am referring to.

In 2016, machine-learning research had already obtained exciting results in the field of visual arts, probably culminating, at least in terms of popularity, in the style-transfer work of Gatys, Ecker, and Bethge (2016). In June 2016 the Google Brain team started a project called Magenta, aiming at "machine intelligence for music and art generation." Later in September, DeepMind researchers presented a white paper (van den Oord et al. 2016) describing WaveNet, a ground-breaking new probabilistic method for raw audio generation based on machine-learning techniques. Their first results were very exciting for me. This model can learn and reproduce speech from text in the time domain at audio rate without any traditional digital signal-processing techniques such as vocoders. It learns the timbre and prosody of different speakers, if it is probability-conditioned to do so. Even more intriguing, however, is that it can run freely (without conditioning the input probability distribution). This method was tested with speech (to generate nonsensical syllables in a funny way), and, most importantly to us, with music. After being trained on a large piano music database, the algorithm could reproduce random new pieces of piano music of plausible quality. By "plausible," I mean a kind of slightly atonal piano-like babbling, with no compositional cue—but this may be the start of a new era for computer music. A strong assertion, perhaps, but I will support it here and report my feelings, hoping more discussions will start soon within the columns of CMJ or in other virtual spaces.

Machine-learning techniques have already been used in computer music for decades. Artists and scientists proposed biologically based works of great inspiration, and computationally oriented research achieved great results in treating melody, harmony, and composition with the use of score notations and the like. Nonetheless, there is a gap that cannot be bridged yet between low-level (raw audio) and high-level (melody, harmony, style, etc.) aspects of musical signals.

All machine-learning efforts in the past, and most of the present ones, work at the score level, missing all the useful information on timbre and expression that cannot be contained in a MIDI stream. This approach is too simplistic, to some extent. In music information retrieval, for instance, successful approaches to song identification use features at different levels. This reasoning could be extended to composition as well; after all, can we generically assign the same part to two instruments that have different timbres and expressivity and expect the same outcome? Days before my writing this letter, the Magenta project, which previously focused only on melody, announced results of a new branch of research that studies the entanglement between pitch and timbre among other topics, using an autoencoding architecture inspired by WaveNet to represent music in a latent space. Thus it seems to me that this is a good pathway to undertake for both research and creative exploration.

Following the first results described by the projects mentioned here, we can start thinking of computer-generated music that learns and embeds features at different scales and abstraction levels. Such a system may be capable of understanding and reproducing nuances that only humans are currently able to detect...

pdf

Share