Audio/Visual Synthesis: The New Arts, Part 2

In this second part of my survey I focus on the tools that achieve this new synthesis of arts. Alas, due to space constraints I am unable to include all the software I would like to have reviewed, but perhaps a future article will deal with those programs. Meanwhile, I present to my readers these brief profiles of Pd, Fluxus, and AVSynthesis. Each of these programs takes a different approach to the practical concerns of blending images (moving or still) with sound (realtime or recorded).

What I'm Into

The following three brief reviews focus on software with which I'm already familiar (Pd and AVSynthesis) or am currently learning (Fluxus). The selections represent my personal choices, but they should not be construed as a "Best Of The Best" collection. Each user will find his or her own uses for these programs, and they may or may not be the right choices for your work. Try them yourself, and if they don't work as you prefer, check out the items listed in the section on what I'd like to get into.


In the course of my career as a writer on Linux audio development I've had many occasions to use and praise Pd. Truly, there seems to be no audio or MIDI service that Pd can't handle, and it is often the only solution to some problems. When combined with the GEM library Pd acquires graphics and video capabilities equal to its vast sonic potential, making it a prime candidate for the integrated synthesis and transformation of image and sound.

In my article Pd And GEM: A User's Report I covered the basic procedures involved in running Pd with the GEM library, so I refer readers to that earlier article for the background to this review. In summary, the GEM library provides Pd with access to OpenGL controls for use with still images and video files and/or streams. These controls perform a wide variety of transformations to the input signal (or signals: Pd supports multiple inputs), and the controls themselves can be arranged to affect those transformations apart from or in tandem with Pd's sound production components. This simultaneous capability can create extraordinary effects, thanks especially to the flexibility of Pd's GUI tools.

Figure 1: The Pd/GEM connection

In its most basic form Pd follows the model of a patching synthesizer. Visual units are connected with virtual wiring to create synthesis and processing networks of any desired complexity, and abstraction is supported for the reuse of previously designed elements and their incorporation into new patch arrangements. The visual units may be audio or MIDI data generators and processors, or they may be GEM graphics routines for video and image processing. Simultaneous transformations can be achieved by the simple expedient of assigning a single controller to both the audio and video parts of the overall process. Figure 1 shows off just such a process, albeit a very simple one. In that figure a single slider controls the amplitude of the audio signal and the volume of the graphic image. The fun begins when the designer realizes that the slider's value limits can be freely redefined, along with its direction (e.g. from 0 to 1 or from 1 to 0). Alternately, the slider's output can be further altered by any number of filters before the data reaches its intended object(s). Obviously Pd is capable of far-reaching possibilities in the domain of combined audio/visual synthesis, possibilities that can not be fully described here. Fortunately Pd is extensively documented, and it so happens that James Tittle II and IOhannes zm&oumlaut;lnig have already published a concise introduction to the topic of Pd And Synaesthesia, so I cheerfully refer my readers to that text for greater understanding of the topic.

Recently I purchased a Logitech QuickCam to use with some of the software I've discovered in my research for this article. Pd is high on my list of programs to use with my new camera, and thanks to the advice and instruction from Pd gurus such as Frank Barknecht and Michael Seta I hope to explore the pdp, PiDiP, and GridFlow packages, all of which are designed for use with realtime video streams.

Extra, extra: As I wrapped up this article I received a message from Pd maven Chris McCormick regarding his Ergates, an audio/visual synthesiser with a 3D interface and a USB gamepad as its main controller. The program is based on the Pd/GEM powerhouse, but no prior knowledge of either system is required. Alas, I had no time to test Ergates, but I do believe I see a gamepad in my future.


The Fluxus system creates audio-modulated visual forms and animations by combining a sound stream with graphics control commands issued by the user in realtime. The command language is a variant of Scheme, the graphics are rendered by the OpenGL libraries, and the audio input comes from any other JACK client.

Pre-built Fluxus packages are available for Fedora and Ubuntu, everyone else has to build it from source. Fluxus is a heavily dependent application, but most of its components should be available in your distribution's software repositories. You will probably need to build the PLT Scheme package (version 372, not 4.0), which will only be a problem if you're building it on OpenSUSE 10.2. If you encounter an error regarding XftNameUnparse you need to upgrade the xorg-x11-libs package to the latest release. With that problem resolved the rest of the Fluxus build process should be smooth and uncomplicated.

Fluxus starts in its own window complete with prompt and blinking cursor. This window is the REPL, the read-evaluate-print-loop so beloved of Lisp adherents everywhere. Don't worry about its name, all you really need to know about the REPL is that it's the Scheme command line. Lines and blocks of Scheme code can be entered at the REPL prompt, but Fluxus has a better way to manage code entry and evaluation. Nine workspaces are provided, each with its own code editor. Code can be entered and evaluated in each workspace, or you can load existing Fluxus files for evaluation.

In Lisp-speak evaluation is the step where your program code is verified and run by the Lisp interpreter. Once the code has been evaluated you can make further additions and modifications to it while the previous evaluation is running. Hit F5 (or Ctrl-e for you Emacs partisans) and the code is instantly re-evaluated. Fortunately it is not necessary to re-evaluate an entire code block: Simply highlight the part with the required change, hit F5, and your alteration will take effect immediately. This interactive environment favors what is known as livecoding, a performance art involving realtime multimedia programming. Like a few other applications reviewed here Fluxus creates and transforms the displayed graphics parameters by modulating values of the OpenGL shading language (GLSL). However, Fluxus performs its transformations in realtime through Scheme commands entered at the display console or from an IDE. Indeed, Fluxus is a natural for livecoders.

Documentation is copious and well-written. Text documentation is available on-line and in the Fluxus source package. Simply entering (help) at the REPL prompt will open the program's integrated help. Further information can be found in the FAQ, Wiki, mail-list, and tutorial videos.


The May 2008 issue of the Linux Journal contains my first article on AVSynthesis, Jean-Pierre Lemoine's software for the simultaneous synthesis of audio and visual streams. Alas, that article is now almost wholly out of date, thanks to Jean-Pierre's many improvements since it was published, though it still serves as a decent introduction to what AVSynthesis is and what it does. In summary, the program combines two or three still images to create a blended image that is then treated to a variety of dynamic transformations to create a sequence of images to be rendered into a movie. At the same time the program creates an audio stream in realtime that can be alloyed to the image sequence in interesting ways.

AVSynthesis combines the image processing power of the OpenGL shading language and the audio synthesis and processing capabilities of Csound5. The program provides graphic editors for these major lobes (Figures 2 and 3), along with similar screens for its event sequencer and a basic mixer. Parameter values can be set directly (fixed value), modulated dynamically by user-defined ranges, or controlled by user-assignable MIDI controllers. The GLSL parameters also respond to the amplitudes from Csound's audio output. This link is the essential conjunction for the program's cosynthesis of image and sound.

Figure 2: The AVSynthesis GLSL controls

Figure 3: An AVSynthesis audio generator

The program is not open-ended. The author has opted to supply only a selection of the possible GLSL and Csound processors, but that selection is not meager. Twenty-one shaders are currently included, each of which adds its unique parameters to a set common to all shader types. On the audio side we find seven signal generators (including a soundfile player) and fourteen processors. Up to three generators and three processors can be active at the same time, though the CPU strain will tell on insufficiently powered machines.

I've placed some AVSynthesis demonstration videos on-line in the Csound group at Vimeo. Unfortunately the resolution is fatally low, though hopefully I'll get better at preparing my videos for the Vimeo site. As some recompense, the audio resolution remains high.

AVSynthesis does not produce a sequence animation directly. The program creates a series of transformed images (in TGA format) that can be concatenated by mencoder or a similar utility to create an animated sequence. The image transformations are carried out according to parameter settings in the visual side of the program, i.e. the GLSL component.

Recent extensions and improvements include the use of the JOGL software for its OpenGL implementation, expanded parameters for some visual transforms, more sound generators, A/V parameter change by MIDI controllers, and greater flexibility in the program configuration. The user has finer control of the image production details in the data/config.xml file, and it is now possible to decouple the image and sound processing in a layer.

What I Want To Get Into

I'm new to the video world, and I'm learning as I go. As I researched this article I came across some other applications that I'd love to put time into, including the programs presented in this section. Alas, I can't give anything more than the merest hint of their capabilities, but you can always check them out further yourself. If you do try them, be sure to let us know how you fare.

Processing is "... an open source programming language and environment for people who want to program images, animation, and interactions." That description fails to mention the fact that sound is a component of the compleat Processing environment, thanks to Krister Olsson's Ess audio library. Processing also appears to have a special appeal for Csound composers: Media artist Josh Knowles blends Processor-generated graphics with a Csound-generated score in his Algoriculturally, and composer Jacob Joaquin has recently combined his Csound-based Slipmat with Processing.

Side Effects Software's Houdini is a famous 3D animation package used in the production of major movie releases, including the Harry Potter series, all three Spider Man films, and the Lord Of The Rings trilogy. It is definitely professional software, with professional pricing, but the company releases public betas and offers a US$99 Apprentice package for starters like myself. Sound design is not its province but Mark Story's CHOP extension to Houdini adds the power to create Csound scores from Houdini's animation data.

Florian Schmidt’s ScGraph brings a 3D graphics server to the SuperCollider audio synthesis/processing environment, opening the way for an integrated audio/visual synthesis system a la Pd. Hopefully I'll find time to work with it enough to write a profile, meanwhile you can check out the ScGraph descriptive page for more information.

I must also mention Lush, a "... simple environment to experiment with graphics, video, and sounds". Again, time worked against me, but the language looks engaging enough for me to return to it. All I need is more time.


As I did my research into the topic I discovered a rather large collection of applications that I'd like to get into. Now I'm armed with a webcam and the software to bend, fold, spindle, and mutilate its images while simultaneously synthesizing an original soundtrack. I'm still learning here, but I think it's safe to assume that you can expect another similar article (or two) to profile software for the video jockey and other forms of realtime A/V performance. Until then, stay tuned and keep swinging.

Load Disqus comments