Sculptor: A Real Time Phase Vocoder

Sculptor is a set of audio tools for Linux that manipulates spectra in real time and provides continuous audio output.
Real Time, Not Real Fast

There is a big difference between a sound-synthesis program which runs in real time and one which simply produces output samples at a greater rate than the sound device chooses to swallow them. For a program to be a real-time synthesiser, it must respond apparently instantaneously to a change in an input parameter. For example, the CSound application mentioned previously is not real time, because it reads the specification of a score and orchestra at initialization, then produces audio output. It isn't possible to influence the sound the program produces as it produces it. (Actually, some real-time extensions have become available, but I am choosing to ignore them for the sake of this example.) Running CSound on a powerful workstation usually causes it to produce samples faster than actual speed, but this does not qualify it as real time.

To design a real-time program, one of the most important design considerations is the user interface, which in turn is strongly influenced by the desired effects. The next stage in the design process is considering the kind of manipulations required for such an application.

The Phase Vocoder as a Musical Instrument

When a synthesis program becomes real-time, it becomes a musical instrument, and when a computer program becomes a musical instrument, the operator becomes a performer. The ergonomics of a musical instrument are highly complex, but from the context of previous uses of this algorithm in computer music, clearly some core areas must be covered: pitch transposition, a change in a sample's pitch with no change in its duration; rate of playback, a stretching or compression of a sample in time with no change in its pitch; and timbral morphing, where one sound changes smoothly into another as pictures do in video morphing.

Sculptor permits independent control over pitch and rate of playback in real time even on very modest computing platforms, and acts as a test bed for more advanced algorithms on faster platforms. It was initially developed on a 386DX40 without a floating-point coprocessor and could make a fair attempt at real-time synthesis at 8000 samples per second (voice telephone quality).

Choosing a Paradigm

Having decided there are essentially two parts of the application, a real-time synthesiser and a GUI, it seems to make sense to divide the processing between the two. One process will be responsible for the audio synthesis, the other for mouse- and window-related processing.

Linux, like most UNIX systems, provides two different methods for inter-process communication (IPC). The first is channel-based: sockets, pipes and so on. This kind of IPC has many advantages; one can easily map the processes onto different machines connected by a network, and synchronization is easily arranged, as a channel can be set up to block in an efficient, non-polling manner until data arrives.

The prism application has two processes which basically operate asynchronously. Essentially, the resynthesiser has to keep running and producing audio samples regardless of what the user is doing with the mouse. For this reason, the second method of IPC, shared-memory or System-V IPC, has been used. System-V IPC also provides methods for process synchronisation: the semaphore. One can raise or wait on a semaphore. Think of it as a special kind of variable which behaves in the following manner. If one or more processes are waiting on a semaphore, raising it enables exactly one of those processes to proceed. If no processes are waiting, then the value of the variable is incremented. Waiting on a non-zero semaphore decrements its value but allows the process to continue immediately. Waiting on a zero-value semaphore adds the current process to a (possibly empty) list of waiting processes, pending the semaphore being raised by another agent.

Semaphores are used in shared-memory situations to implement mutual exclusion locks and prevent update anomalies where several processes simultaneously attempt to modify a shared data structure. However, prism uses only two processes accessing the shared-memory block: the GUI is a producer because it is supplying control parameters, and the synthesiser is a consumer because it uses them to generate audio samples. Since there is only one producer and one consumer, there is no need to use semaphores as access arbiters. In fact, advantage is taken of the shared-memory IPC to allow the producer to provide a set of “magic” parameters which change according to the user's gestures.

Allocating and Using Shared Memory Blocks

Upon startup, prism has to allocate and set up a shared-memory block, then fork off the process to generate the audio output. The routines it uses are documented in the shmop manual pages. Enough memory is allocated to hold a control structure and all of the spectral data produced by the analysis program (see Listing 1).

prism calls shmget to allocate the required amount of memory; it returns a handle to the memory block for subsequent use. The other parameters specify the access permissions in the normal chmod format, and the block will be created if it does not exist yet. The process then forks with the child being responsible for synthesis, and the parent for control functions.

After the fork call, both the parent and the child processes must attach the shared-memory block and cause it to appear in their respective memory spaces. The appropriate system call is shmat. The parameters indicate the handle of the shared memory block and the desired target address. Passing NULL as the latter tells the system to make the choice of address. In Linux running on an i386 architecture, the blocks are allocated downwards in memory starting at an address of 1.5GB. This call can be made once before the fork system call, as the shared block will then appear in both the child's and parent's memory space.

One trap waits to catch the unwary programmer using shared memory blocks: they are persistent. If your application crashes without properly tidying up shared-memory blocks, memory will leak like a sieve. The user can check for any undeleted memory blocks using the ipcs command and remove them with ipcrm. prism does its best to cope with any unexpected events by catching the SEGV signal and shutting down any shared-memory activity before exiting. However, the best safeguard against memory leaks is to mark the shared-memory block for deletion as soon as it is created. Counterintuitively, the way to do this is to mark the block as transient using the shmctl call, and then detach the process from the shared-memory block. The shared-memory block will persist until all the processes using it detach using the shmdt call, so the block will disappear automatically when the parent and child processes exit.