Dirt-Cheap 3-D Spatial Audio

With the addition of free audio software, an ordinary inexpensive surround sound card becomes the basis for a 3-D cube for simulation, visualization or gaming.

Many computer systems set up for advanced gaming include Dolby Surround Sound. The typical speaker configurations are 4.1—four speakers and one subwoofer— 5.1 and 7.1. This system is designed for all speakers to be located on a plane centered at the listener, and thus it is not possible to have a sound truly be emitted from above or below the listener, although some systems attempt to simulate that effect. Imagine a game scenario where a monster is climbing down a wall above and behind the player while, at the same time, a mouse is scrambling across the floor behind the listener. In a planar surround system, the sound effects for both the monster and the mouse would come from the rear speakers, making it hard to distinguish the actual locations of the sound sources.

With true 3-D spatial audio, the monster's sound effects could be played from speakers located to the back upper-left and the mouse's sound from speakers located to the back lower-left and back lower-right. In this setup, the player has a much better feel for what is creating the sound and where the sound is coming from. Now the player can arm the rocket launcher and turn toward the back upper-left directly and blast the monster—no need to aim toward the harmless mouse.

Spatial sound has been available for several years and primarily is employed in immersive virtual environments. The systems are not in mass-scale production and often must be installed by professionals, making them costly and out of the reach of most home users. We have devised a low-cost true 3-D spatial audio solution that requires only inexpensive consumer-level hardware and open-source software. This solution allows for the arbitrary placement of speakers, not necessarily co-planar as in other systems. Our 3-D spatial audio solution is the first that we are aware of to provide true 3-D sound at such a low cost.

Background on 3-D Spatial Audio

Preliminary technology for 3-D spatial audio, Fantasound, first was developed in the late 1930s by Disney for the movie industry. Over the years, a great deal of work has been done to advance the field, especially by Dolby Laboratories. In the last few decades, researchers enabled personal computers to emit spatial audio. Today, spatial audio is commonplace in modern computer games. Home systems typically use headphones or a planar array of speakers, usually in a preset configuration, such as Dolby Surround Sound 5.1.

Headphones present a unique opportunity to provide inexpensive 3-D audio. Algorithms that use head-related transfer functions (HRTFs) can create convincing 3-D spatial audio on headphones using a simple stereo sound card. HRTFs use data about how sound is transformed by the user's body, especially the shape of the ears, for mapping sounds with 3-D positional sources. The technique relies heavily on applying different time delays for each ear. Ultimately, we decided not to use headphones, because we needed a system that scaled easily to many users. It was far more practical and cost efficient to use speakers.

A number of high-cost professional-grade hardware packages are available, such as the RME Hammerfall series, M-Audio Delta series and Lake Audio, that provide true 3-D spatial audio. Each package has a cost exceeding $1,000 US, boasts high sound quality and has a large array of features aimed at the professional market. Although the acoustic quality of these packages undoubtedly is higher than that of our low-cost 3-D audio solution in terms of audio clarity and fidelity, both options provide true 3-D spatial audio.

When we started putting together a spatial audio system, no inexpensive hardware and software combination existed to produce true 3-D spatial audio. Although there are software APIs that allow arbitrary, not necessarily co-planar positioning of sound sources, such as Microsoft DirectSound and the Advanced Linux Sound Architecture (ALSA), the low-level drivers officially support only the co-planar 4.1, 5.1 and 7.1 speaker positions mentioned earlier. There is no way to tell the drivers that the speakers have been moved to an alternate configuration, for example, with speakers above or below the listener. So even though software developers could position a sound above or below the user's head, the low-level drivers still assumed the sound was emitted in a circle around the user's head. The bulk of true 3-D spatial audio support comes from customized APIs.

Tommi Ilmonen at the Helsinki University of Technology (HUT) developed a 3-D spatial audio API called Mustajuuri that is built on the ALSA drivers. The Mustajuuri API implements Vector Base Amplitude Panning (VBAP), introduced by Ville Pulkki (see the on-line Resources), as the underlying 3-D spatial audio model. In short, VBAP is the algorithm responsible for moving a sound across a 3-D array of speakers and making the sound appear to come from a specific direction. VBAP selects the three speakers closest to the virtual sound position and calculates the required volumes for each speaker. See Figure 1 for an example of how VBAP panning works. Mustajuuri also simulates depth for audio by using time delays and distance attenuation. This makes it possible to position a sound anywhere in space relative to the listener. Mustajuuri already has been used to produce 3-D spatial audio using high-end audio cards, but up until now it has not supported low-end audio cards.

Figure 1. View of 3-D spatial audio test case in the immersive room. Visual depictions show from which speakers the sound is coming for the current view. For each sound, the three speakers closest to the virtual sound source are used to play the sound. Their volumes are varied based on the distance from the speaker and a number of other factors.