Free Software and Multimedia
Each presentation was scheduled for 30 minutes, with a five to ten minute space between each presentation reserved for questions from the audience. It is a great credit to the speakers that they were all well prepared and kept their presentations within the allotted time. I should also note that all the presentations were well received by the audience.
The conference started promptly at 9:30 Saturday morning with Marco Trevisani's introduction to the Demudi project. The project name is an acronym for the Debian Multimedia Distribution, a Linux distribution built upon an existing Debian system and optimized for multimedia performance. The distribution will include a collection of multimedia applications and a Linux kernel optimized for low audio and video latency times.
An FTP site will be established for on-line access, and the Debian apt-get tool will be employed for package updates over the Internet. An alpha version of the distribution is planned for the International Computer Music Conference (ICMC2001 will be held this September in Havana, Cuba).
(Note: A private meeting focused on Demudi was held the day before the workshop, attended by only the workshop participants. Material regarding that meeting is presented in an addendum to this article.)
As the President of FSF Europe, the official European sister to Richard Stallman's Free Software Foundation, Georg Greve was eminently suited to present the definition and history of the free software movement. Mr. Greve clarified and discussed the content and purport of various open-source licenses (e.g., GPL, FreeBSD, MIT) in the context of the FSF's Four Freedoms, defined by Richard Stallman as:
The freedom to run the program, for any purpose.
The freedom to study how the program works, and adapt it to your needs.
The freedom to redistribute copies.
The freedom to improve the program, and release your improvements to the public, so that the whole community benefits.
This part of Mr. Greve's presentation was perhaps the most interesting. Licensing issues are a concern to many developers, as was evidenced in the lively question and answer period that followed. Mr. Greve's detailed knowledge of the various open-source licenses certainly clarified a number of questions regarding the legal issues surrounding the protection of free software.
Mr. Greve summarized the current activities of FSF Europe (suffice to say he is a busy man these days) and finished with an overview of the FSF's plans to expand into other areas of the world, particularly India and China.
XDV (Verein für experimentelle Datenverarbeitung) is a group of audio and visual artists working together in Vienna. Their activities include live internet radio streams, web art, and the development and use of the Pd sound synthesis/processing and composition environment.
Pd is a graphic "patching" environment for the creation of audio/visual instruments. The user selects various kinds of objects (DSP modules, synthesis methods, soundfile record/playback controls, video and 3-D graphics, etc.) and connects them together to create a signal and control path known as a patch. Patches can be nested within patches, making it possible to create complex instruments with relatively simple control interfaces.
Günter Geiger demonstrated the Linux port of Pd along with GEM, an OpenGL graphics library. He began with a simple FFT display of a real-time input signal, quickly evolving the display into a complex and fascinating "waterfall" graphics display of the input's frequency content. Günter also showed how Pd coordinates audio and MIDI I/O with the display and manipulation of 3-D graphics in real time, and his final flourish was a tantalizing demonstration of Pd's recently-added video capabilities.
Open-source development is certainly not restricted to any particular platform. Gabriel Maldonado has developed a variety of useful opcodes and extensions for the Csound audio synthesis/processing environment. His DirectCsound is a greatly enhanced version of Csound for Windows, and many of his opcodes have been added to the canonical Csound source distribution at Bath UK. Gabriel has placed his extensions for DirectCsound under the GPL, and he has worked closely with Nicola Bernardini on integrating his opcodes into the unofficial Linux Csound.
Mr. Maldonado demonstrated his recent Csound opcodes that utilize the FLTK graphics library to provide an intrinsic set of widgets for Csound. These widgets include knobs and sliders for the construction of synthesizer interfaces, effectively giving the user the means to create a "softsynth" from Csound's powerful audio processing toolkit and its own set of graphic control elements.
The VMCI (Virtual MIDI Control Interface) was also presented. This software provides a set of virtual MIDI controllers (sliders, knobs, etc.) for use in adjusting Csound opcode parameters in real time. The VMCI software is free and licensed under the GPL, but it is written in Visual Basic, thus restricting its portability.
Paul Davis summarized the progress he has made over the last two years in the domain of Linux support for professional audio standards. After presenting a list of his personal motivations (which included the desire for a professional-quality multichannel, multitrack hard-disk recorder, the goal of his Ardour project) he described some of the problems with the use and design of proprietary audio software. Those problems include hardware dependencies quickly outdated, closed internal implementations restricting study and extensibility and reliance on inferior operating systems yielding poor multitasking and interprocess communication.
Mr. Davis then presented the situation regarding problems with the Linux audio infrastructure in 1999: no support for 24-bit sampling, no multichannel I/O support, no support for pro audio hardware interfaces and no implementations of MIDI Time Code and MIDI Machine Control. He also enumerated the shortcomings of the existing Linux audio applications (no advanced soundfile editor, no software capable of working with multichannel soundfiles, too many unfinished apps and so forth).
Next Mr. Davis set out in some detail the software implementation challenges facing Linux audio development (reducing kernel latency, implementing real-time programming models, disk streaming and the use of plugin and component software architectures). He described the current status of proprietary audio software and its trends, proceeding then to define the challenges facing free audio software (LADSPA vs. VST plugins, proliferation of GUI toolkits, adoption of ALSA into the Linux kernel, etc.).
Davis' final remarks were particularly interesting. Drawing from the full title of his presentation ("2 years reinventing the wheel: Linux as a platform for pro audio applications") he asked why we are solving again what audio application developers for Windows and the Mac have already solved. This response from his notes provided an eloquent answer and closure:
"The promise: dedicated audio hardware performance from a general purpose operating system \t with highly evolved networking support, databases, multitasking, distributed processing APIs, \t accessibility support and much more."
This presentation was my own contribution to the proceedings. I described my entry into the Linux world in 1995 due to the desire to run Doug Scott's MiXViews, a powerful soundfile editor for a variety of UNIX platforms. MiXViews for Linux was in need of some attention, so I worked with Doug and eventually produced a working version. That first step emboldened me to attempt other ports of audio software (including a variety of signal processing applications from NoTAM, a Norwegian research center for music, acoustics and technology). Shortly afterwards I started to maintain a list of available sound and music applications for Linux. I pointed out that in 1995 the list numbered about 30 entries, while it now lists more than 800 audio applications ranging from simple soundfile players to professional level hard-disk recording systems like Ardour. I also described the beginning of the Linux Audio Development group and the evolution of more coordinated programming efforts. After a look at the current condition of Linux audio applications and system development, I ended with a positive prediction for the future of Linux sound and music software.
Giampiero Salvi described various open-source tools used in the speech analysis and synthesis research departments at KTH in Stockholm (Kungl Tekniska Hoegskolan: Sweden's Royal Institute of Technology). These tools are themselves built on the SNACK audio toolkit written by Kåre Solander and others at KTH. Mr. Salvi demonstrated the WaveSurfer soundfile editor and some of its plugins. An especially interesting plugin displayed a talking head, complete with facial expressions, whose vocal characteristics were varied by a graphic control interface.
Mr. Salvi also demonstrated two Java applications in development at KTH. Alexander Seward's ACE is an environment for building automatic speech recognition (ASR) systems. ACE includes signal processing techniques commonly used in ASR, such as linear prediction and cepstral analysis, but the package also includes tools for defining aspects of language's grammar. Håkan Melin has designed his ATLAS as a platform for building multilingual and multimodal speech applications. A telephone dialog system is a typical use for a multilingual speech analysis/synthesis application. The system must recognize the spoken input in a variety of languages, and then respond in the same language. Multimodal systems augment the linguistic aspects with other modes of reference. A guide system could not usefully respond to my query "Is there a restaurant here ?"; however, if it presents a map and I ask the same question while clicking on a specific street, then the system is described as taking a multimodal approach.
IRCAM, the prestigious French institute for research in music and acoustics, has been supporting one of the most significant developments in free software for musicians. jMax is a Java implementation of the popular MAX synthesis environment (MAX is a graphic patching language similar to Pd).
jMax is an open-source project developed primarily by a team at IRCAM led by François Déchelle and Norbert Schnell; however, jMax is truly a model of distributed development, with much input from its users and the active jMax mail list. It has been in development since 1998, and the first Linux beta version was released in early 1999, licensed under the GPL.
François Déchelle described the efforts taken to convince the powers at IRCAM that jMax would benefit from an open-source development model. Now it appears that the success of jMax has inspired IRCAM to release another package to the Open Source community: OpenMusic (formerly PatchWork) has already been licensed under the GPL and will soon be available in a version for Linux (it is currently available only for Macintosh). OpenMusic is a flexible environment for music and sound composition, utilizing a highly visual workspace with drag-and-drop graphics. It is also a MidiShare client, which means it is a simple matter to connect jMax and OpenMusic, thus creating a powerful integrated environment for sound synthesis and music composition. Mr. Déchelles noted that although jMax is ready now for use on SGI or Linux computers, OpenMusic for Linux should be available soon.
Stanko Juzbasic is an independent composer who was drawn into programming by the need for a specific tool. He has focused his programming efforts primarily on three projects: RingMod (a performance-quality software ring modulator for machines running SGI's IRIX operating system), SculptTool (a utility for modifying analysis data files produced by the IRCAM software packages AudioSculpt and Super Phase Vocoder) and Ceres3 (a spectral domain editor/processor). His presentation centered on the motivations and experiences he had while working on the development of Ceres3.
Mr. Juzbasic first presented a brief history of the Ceres software, pointing out that its development has already been a long-term collaborative effort. The original package was designed and written by Dr Øyvind Hammer at NoTAM for SGI machines. The first Linux port was made by myself with assistance from Richard Kent in 1996-1997. Johnathan Lee extended Ceres to Ceres2, adding new functions, cleaning up some old code and fixing bugs. Ceres2 was also ported to Linux, and composer Reine Jonsson added WAV file support in a version he called Ceres2w.
Both Johnathan Lee and Mr. Juzbasic studied with Brad Garton at Columbia, where Ceres is in constant use. After working with Ceres2 Mr. Juzbasic decided to add even more functions and revise the core analysis engine. He also prepared new versions for IRIX machines, Linux x86 platforms and LinuxPPC. Seeing the extent of his additions and enhancements, he renamed his version of Ceres to Ceres3.
Mr. Juzbasic demonstrated only the basic operations of Ceres3, but the program's potential was clearly evident.
Csound is certainly the most widely known and utilized software sound synthesis environment. However, the language shows its age (its user interface is a throwback to the heady days of programming in assembler and FORTRAN), and many developers have worked on projects to modernize Csound. Maurizio Umberto Puxeddu has been using the Python scripting language to create some interesting tools for Csound, including a graphic front-end launcher (CSFE) and a powerful Csound score generator called Pmask. Mr. Puxeddu treated his audience to an in-depth description and demonstration of Pmask as an example of how Python can be used to extend the Csound language.
Pmask is itself based on Andre Bartetski's Cmask, a set of utilities for the design and exploration of algorithmic music composition. Both Cmask and Pmask employ tendency masks to weight and constrain the random occurrence of events into more deterministic forms, but Pmask also utilizes Python's object model to store, organize and process musical objects. The composer is thus freed from having to write Csound scores event by single event (a rather tedious chore), and he has more direct control over large-scale formal factors. Mr. Puxeddu explained how Python was especially well-suited to such a program (ease of use, object-oriented, support for arrays, etc.), and his demonstrations of Csound music created by Pmask were quite interesting.
Kudos must go to Professor Bernardini for his able direction and genial leadership. All necessary amenities (computers, sound system, projectors, etc.) were available to the participants and everything worked. Attendance was good, fluctuating throughout the day, and I estimate that perhaps 75 to 100 people attended the workshop. All presentations were well-prepared and well-delivered; the signal to noise ratio was high, the question and answer sessions were sometimes quite lively, and it seemed to me that everyone had a pleasant and informative time. Special thanks also go to Mariapia Redditi for her help in organizing an internationally scattered collection of speakers; great thanks are also due to Alessandro Morgantini and his crew for their preparation and operation of the hardware used by the participants.
Of course, no report from Firenze would be complete without mentioning the beauty of the city and its delicious food and wines. The panel members enjoyed a memorable Tuscan feast at the amazing Buzzino's, where the staff patiently tolerated our unusual company until well past midnight. A note to the workshop organizers: I'm ready to return at any time.
Dave Phillips maintains the Linux Music & Sound Applications web site and has been a performing musician for more than 30 years. The Book of Linux Music & Sound (No Starch Press, 2000) is his latest publication.