Audio/Visual Synthesis For Linux: The New Art, Part 1

The Linux Journal recently published an article I wrote on Jean-Pierre Lemoine's AVSynthesis, a program designed for artists working with the computer as a medium for the synthesis of image and sound. I'm fascinated by that program, so I decided to research the existence of similar software. This article presents the current findings from that research.

Some History

When we think of art forms that blend image and sound we usually think of movies, television, and videos. Indeed, it can be argued that the technology of film-making reached its modern stage only with the adoption of frame-synchronized sound. However, the use of sound in conventional cinema and broadcasting is a rather limited employment. For the most part, sound there is limited to musical commentary and sound effects, and while those employments may be imaginative and interesting in themselves, they act primarily as handmaidens to the visual drama.

Figure 1: An image created by AVSynthesis

The software I've researched so far takes some very different approaches to blending sight and sound. Terms such as "abstract" and "non-representational" come to mind, and some of these applications do indeed produce abtract and non-representational images and sound. However, each artist will find his or her own use for these programs, and the software itself imposes no particular limitation on the use of its output.

The computer makes possible art forms that have been difficult if not impossible to realize without the machine. Relationships between image and sound can be defined in any way imaginable, and the sound need not play the role of handmaiden to the image. Wholly arbitrary relationships can be defined, relationships that may be most useful to the composer. These methods may be simple or complex, but their true value is their utility to the artist. He may follow a process untouched through its entire run, or he may jettison a method as soon as it has served its purpose. Of course that purpose is decided upon by the artist, and it is subject to change at any time during the creative process.

So who benefits from these machine-assisted possibilities ? Conventional movie-makers, multimedia artists, and vee-jays are obvious targets for audio/visual synthesis, but certainly anyone with a Web cam or a video file can play around with the software I'll present here. However, neophytes should be aware that the combined domains of sound synthesis and image processing include a great deal of jargon and other technical language, and while there's much fun to be had just playing with the programs, better results are more likely as your technical knowledge increases.

This combined art is heavily dependent on the processing power of modern CPUs and GPUs, but it is not without antecedents. Surrealist and Dadaist film makers such as Man Ray and Salvador Dali experimented with non-representational combinations of sights and sounds in the early part of the 20th century. Pioneering work with computers was done by John Whitney (who also studied 12-tone music composition) and Yoichiro Kawaguchi, whose fantastic images and animations continue to inspire computer-based graphic artists. [1]

Some Categories

As I studied this art and its software I realized that I had launched myself on to a sea of novel possibilities. To help distinguish between techniques I've made the following categorizations for the most typical blends of son et lumiere :

Sound supports images - Here we find the conventional use of music as background, intensifier, or pedal to the action on the screen. In this application typically the sound has no effect upon the images and is essentially merely accompaniment. Sound is applied in the form of the music and songs used to underscore and emphasize the action on the screen, and in the sound effects and Foley effects heard throughout the movie. While the accompanying music and sounds may themselves be aesthetically imaginative, they typically play no part in either the generation or transformation of the images. For a more complete presentation of this category I refer my readers to Aaron Copland's remarks on writing music for films. [2]

Image produced and/or altered by sound - Images are rendered in realtime while synchronized to an audio stream. This category includes the sonic visualisers packaged with most modern media players, along with audio data visualizers such as eXtace, Baudline, and Sonic Visualiser. However, these applications perform no sound synthesis or other audio processing. Sound comes in from an external source, and its analyzed data is represented in a variety of visual displays, some of which can be quite striking. See the Scopes page at for a list of data visualization software for Linux.

Image to sound conversion - Also known as sonification. An image is subjected to a variety of rules governing the transformation of its shapes and colors into the elements of sound. Typical associations include pixel to grain, gradient to amplitude, height and width to intensity and duration, et cetera. Programs in this category include Kurt Rosenfeld's Sound Mural, NoTAM's famed Ceres spectral editor, and the Csound5 audio synthesis system (which can create an image as well as convert one). Peter Meijer's venerable vOICe Sonification Applet is an excellent Java-based demonstration of a well-defined conversion program. [3]

Simultaneous synthesis of sound and graphics - In this software the audio stream and associated images are created and processed simultaneously. Exemplary programs in this category include AVSynthesis (Figure 2), Dave Griffiths' Fluxus (Figure 3), and the Pd/GEM powerhouse. Simultaneous synthesis in the audio and visual domains can place heavy demands on this software: Pd includes its own high-quality sound synthesis/processing engine, AVSynthesis leverages the great audio power of Csound5, and both programs depend on OpenGL for their image creation and processing routines. This category is the main focus for the software presented in this article.

Figure 2: AVSynthesis at work

Figure 3: Fluxus

Some of these programs process only still images, while others work with existing video files and/or realtime live action video input. If you plan on using live feeds, make sure your kernel is compiled for Video4Linux support.

Hardware Requirements

Audio/visual synthesis makes heavy demands on hardware resources. The machines I used to test the software include 2.0 GHz and 2.4 GHz CPUs, with 3 GB RAM, large fast SATA disks, and fanless nVidia 7600GS video cards with 512 MB on-board RAM. For either audio or video, these machines are workhorses, but combined sound and image synthesis wants more. It's easy to max out CPU performance with a high-quality reverberation effect applied to a looping soundfile while updating a densely texture-mapped animation with constantly morphing imagery, all calculated in realtime. Regarding hardware, bigger and faster is better and better.


As mentioned above, my focus has been on software intended for creative artists, with an emphasis on realtime or near-realtime processing capabilities. I have purposefully ignored Cinelerra, Blender, and other similar applications that are better understood as video sequence editors and compositors, though they share some of the tools and techniques of the software I'll profile in the next part of this article. Until then, hasta la vista (y la sonida) !


[1] See also Profile of Yoichiro Kawaguchi and Yoichiro Kawaguchi: Works.

[2] Copland, Aaron, Richard Kostelanetz, and Steven Silverstein (2004) Aaron Copland: A Reader : Selected Writings 1923-1972, Routledge

[3] Thomas Baudel's HighC is another Java-based converter. HighC is modeled after the great UPIC composition system designed by composer Iannis Xenakis. Alas, HighC's audio output is currently broken in Linux and the author is unlikely to fix it in the near future. HighC is a proprietary project, interested readers may contact Thomas about fixing the Linux version.

Load Disqus comments