Dirt-Cheap 3-D Spatial Audio

With the addition of free audio software, an ordinary inexpensive surround sound card becomes the basis for a 3-D cube for simulation, visualization or gaming.
Hardware Testing and Calibration

Figure 5. Immersive room depiction showing placement of speakers in a cube array and audio coming from the user's front upper-right direction; the lines are colored red.

We tested the hardware design of our 3-D spatial audio system by integrating the hardware with our four-wall immersive virtual reality room at the Virtual Reality Laboratory, part of the Naval Research Laboratory in Washington, DC. We arranged the speakers in a cube array and placed them at the corners of the immersive room, as shown in Figure 5. We designated a 1.2GHz Red Hat Linux machine as an audio server and installed an Audigy 2 ZS card. We connected the speakers using the cabling described before and tried the system both with and without the mixing board mentioned earlier.

While Mustajuuri was running with three audio sources in motion, CPU utilization on this machine generally was less than 20%, and the memory usage was negligible. Further savings could be realized, of course, by using optimized compiler settings rather than the debug settings we used.

We tested the outputs from each speaker to determine the range of intensities that could be played on each channel. We listened to each speaker individually for sound quality, sound balance and percussive resonance. The easiest aspect to listen for is the sound balance between treble and bass. If one of the two obviously is higher than the other, adjust the related frequency filters as needed. For example, if there is too much bass, decrease the bass and/or increase the treble. If there seems to be excessive low-end or high-end noise, adding a low-pass or high-pass filter may be necessary.

Another easy aspect to listen for is speaker distortion. Simply put, if the speakers are so loud that the sound produced is bad, lower the volume of the speaker. If a given set of speakers cannot produce quality sound at an acceptable volume, it may be necessary to acquire more powerful speakers.

One of the hardest aspects to listen for is the resonance of percussive sounds generated by a speaker. This quality basically is how much the sound echoes from where the speaker is located. Adjustments have to be made if it sounds like a speaker is reverberating with percussive sounds. Depending on the quality of the speaker and the quality of the mixer board, this problem may be corrected to some degree by continuing to filter the signal. For excessively bad cases, hard objects such as exposed metal, concrete, hard plastic and even glass should be covered with a sound dampening material such as cloth or foam.

Once each speaker is calibrated, the entire setup has to be balanced. This can be done either by using devices designed to measure acoustic levels or by listening to the speaker from the predetermined center of the 3-D speaker array. Either way, the gain of each speaker should be adjusted until the same audio intensity level is received from each channel. Keep in mind that the outputs for each channel of the audio card were customized by the manufacturer for the intensity requirements for each type of speaker—satellite, center and subwoofer—normally attached in a surround configuration, and the intensity output for each type differs. There are many published methods for dealing with this problem, but we went with the low-tech solution of having someone stand and listen at the center of our speaker array. We set the software control to maximum gain and adjusted the mixer board based on feedback from the listener. Remember that these changes can be made in software with the ALSA drivers and Mustajuuri if a mixer is not available.

Software Testing

We tested the software by integrating sound into an existing in-house simulation platform, BARS-Utopia, that operates on a Linux visualization cluster from ORAD Incorporated, which drives our immersive room. BARS-Utopia supports several virtual world databases, interaction methods and spatial audio. However, no support was available for interacting with the Mustajuuri API in particular, so we implemented a plugin to bridge the BARS-Utopia spatial audio support with Mustajuuri. BARS-Utopia already contains all of the information needed by Mustajuuri, such as sound source positions, listener position and orientation and sound source creating/deletion notifications—the plugin simply translates that data into a form that Mustajuuri understands.

When the plugin was completed, we tested and debugged the new system. The primary software adjustments we made were to the attenuation level of the audio channel outputs. Mustajuuri uses a simple attenuation model and requires some manual tweaking for the expected environment, things such as outdoor, indoor, time of year and so on. In the real world, sound attenuation rates are quite complicated and are influenced by factors such as temperature, humidity and the frequency makeup of the sound.

We tested the sound system by implementing several scenarios, each with a different scene dataset and different audio effects attached to an animated object. Before the audio objects were animated, we evaluated several volume levels and several distances away for each object. Figure 1 shows a simple scenario we designed and tested—the sound effects of a car. When we finished testing the volume and distance effects, we generated an animated path for the car to follow.

Figure 6 shows a more complex scenario with three audio sources, tank, jet and helicopter—the jet is off the screen and not shown in view. We performed some simple tests to see how many sound sources interacted together. It was of primary importance that the jet, typically far away, not sound too quiet, while the tank and helicopter, typically closer to the camera, not dominate the aureal bandwidth. As a result, some minor tweaking was done on both the far and near objects' attenuation parameters.

Figure 6. Shows a user interacting with a scene with multiple sound sources—a tank, helicopter and a jet that is not visible. The nearest speakers are determined for each sound source and the output is mixed at each speaker, if overlapping.