Dirt-Cheap 3-D Spatial Audio
The hardware needed to set up a low-cost 3-D spatial audio system includes a commodity sound card with certain features, speakers and audio cables. Here, we describe our choices for hardware components and the steps needed to set up the hardware. Throughout the discussion, refer to Figure 2 for an illustration of the hardware interconnections, speaker placement and wiring.
The first thing to consider is the number of speakers needed to produce 3-D spatial audio for a specific application. A minimally encompassing setup produces sound from all directions around the user—left-right, front-back and up-down. The speakers can be placed in any configuration, but setting up the 3-D audio panning functions is not as simple for irregular configurations. We decided to use eight speakers in a cubic configuration, with each speaker at a vertex of the cube, as shown in Figure 2. There is nothing special about the speakers needed for this task—the choice is a matter of budget and taste. We used eight amplified commercial-grade speakers for the simple reason that we already had them in our lab.
The eight speakers require a sound card that can produce eight channels of audio. Of the low-cost commodity audio cards, the only applicable candidates are the 7.1 cards. We chose the Creative Labs Audigy 2 card, which we found available at the time of this writing for as low as $90 US. Although it is possible to produce eight independent channels of audio using more expensive sound cards, the Audigy 2 card is the only commodity card we are aware of that has drivers in place to support what we are doing.
It is important to understand the output from the card. In a typical Dolby Surround Sound 7.1 speaker arrangement, there are two front speakers, left and right; two side speakers; two rear speakers; a center speaker, which sits in front and above the video screen; and a subwoofer channel, the .1 speaker. The Audigy 2 ZS has three analog output jacks, an 1/8-inch mini-phone, labeled 1, 2 and 3, that provide line-level outputs for the eight speakers. Jack 1 is three-pole, meaning it carries three signals—two signals drive the front left and right speakers and the third is ground. Jacks 2 and 3 are four-pole, each carrying four signals. Jack 2 drives the rear speaker pair and a side speaker, while jack 3 drives the subwoofer, the center speaker and the remaining side speaker. One final consideration is these signals are unamplified line-level, so the speakers need to be the amplified type that accept line-level inputs. Alternatively, a separate amplifier or set of amplifiers should be used between the sound card and the speakers.
The next step is to install the speakers. Many speakers designed for surround use include mounts, but speaker mounts are available commercially for a number of other speaker types. In our application, we already had the cubic infrastructure in place and used custom mounts to attach the speakers to the cube. The simpler the speaker configuration, the simpler the software configuration can be—that process is explained later in this article.
Finally, the speakers must be connected to the audio card. How one connects speakers to the Audigy 2 depends on the type of speaker and amplification used. A trip to a favorite electronics store should yield any necessary connectors. In our case, the speakers each have a two-pole 1/4-inch phone jack, so we needed to split the three combined outputs of the sound card into eight separate signals. For jack 1, we used a readily available 1/8-inch-stereo-to-dual-RCA adapter. For jacks 2 and 3, we found similar adapters with four poles and three RCA connectors. These adapters are used most commonly with camcorders, where the three signals are used for composite video and stereo audio. These adaptors gave us eight separate RCA connectors, and after obtaining eight long RCA-to-1/4-inch-mono cables, we were set.
In our final configuration, we used an Alesis Studio 32 mixer. This device fits in-line between the audio card's outputs and speakers' inputs and allows fine-tuning of the volume levels. Although the mixer made it a little easier to test and tune the audio, it wasn't truly necessary, as the same adjustments can be made in software.
The software solution for low-cost 3-D spatial audio is best described by the layered hierarchy shown in Figure 3. The software layers required to interface with the sound cards include low-level audio drivers and a 3-D spatial audio API. We focused our primary development efforts on Linux because of easy access to the source code for low-level audio drivers and the overall support community that exists for developers working on projects such as ours.
For the driver layer, we chose ALSA, which was mentioned previously. ALSA provides audio and musical instrument digital interface (MIDI) functionality to the Linux operating system. It supports many types of audio hardware, ranging from consumer sound cards to professional multichannel audio interfaces.
We selected ALSA because it appeared to require the least effort to generate the eight channels we needed for 3-D spatial audio. Until we modified the ALSA driver to access all eight channels, it supported only six channels (5.1) on the Audigy 2. These changes have been incorporated into ALSA, but they may or may not be in a release version at publication time. In that case, one can get the latest source and build it—be sure to include the emu10k1 sound card argument when using the ./configure script so that the ALSA driver recognizes the Audigy card.
After the driver is set up, the 3-D spatial sound API can be installed. It distributes sound effects from a given 3-D position to the appropriate audio channels. Although there are quite a few APIs to choose from, we chose Mustajuuri, as mentioned previously. The Mustajuuri software works with ALSA and provides 3-D panning over an arbitrary array of speakers using the VBAP algorithm, also described previously. The Mustajuuri API provides all of the features needed for a basic 3-D positional sound system and is fairly easy to extend. Over the course of this project, we made several minor source code modifications, and they are included in the October 2004 release.
Mustajuuri does its magic by way of a module called the Mixer, which mixes multiple sound sources—sound files, microphone inputs or other sources—into individual audio streams. These streams then are piped into a panning module, which is responsible for routing each input signal to the appropriate speakers, setting the correct gain and time delay at each speaker and mixing multiple streams meant for the same speaker into a single stream to be sent to that speaker. It does the routing and gain calculations based on VBAP, and some additional gain and delay calculations are based on distance. The result is each incoming sound source to the panning module leaves from a set of three speakers, and the resulting sound appears to come from a specific 3-D position in space. Doppler shifting also is simulated.
Once Mustajuuri is compiled and installed, several tasks must be performed to configure the software to work with the given 3-D speaker array.