Spectral Analysis with Linux Systems and the Merlin DSP Project

by Gordon Miller

In the past few years, there has been a tremendous interest in low-cost, high-performance spectral analysis systems on the part of audio engineers, speaker designers, musicians and instrument manufacturers. Windows systems have long been regarded as unreliable for high-end applications, and consequently such applications historically have been the domain of the professional workstation.

Linux, however, has proven to be stable and well appointed with program and documentation development packages. In particular, the OSS sound driver for the Linux kernel (2.4.6 on my system) works well with SB16 type sound cards.

I should acknowledge the inspiration for this project was in part due to the sterling work of Philip vanBaren. Other DSP packages are around, but I hope that xspect will fill a certain niche.

Sound Cards

Having a sound card with record and playback capabilities is essential. The card need not be full duplex unless the application involves (nearly) real-time digital mixing and recording.

8-bit resolution is not really appropriate for spectral analysis, but 16-bit (or higher) is much better. It should be noted that the DC error (the stream is not all zeroes when there is no analog input) should be estimated and applied at runtime.

On the subject of DC error magnitude and drift, there is an enormous difference in quality between compatibles, including the ALS007, the ALS100 and name-brand sound cards. For the work you will be doing, there is no point on skimping on audio quality.

If you do nothing about the DC error, even with Blackman windowing there is an enormous leaky spike at 0Hz. This is quite enough to render any chi-by-eye laughable.

The card you choose should be capable of both mono and stereo operation. Be aware that the Intel i810 chip set cannot record in mono. So how do you get a mono signal from a stereo one ? Certainly not by taking the arithmetic mean of the left and right components. The power is what's invariant, so the correct procedure is to take the RMS average. There is a separate DC error for each channel; better have the sound card sort that one out.


X servers for Linux have come a long way, and few high-end graphics cards are not amply catered for. Although some people claim that X is slow, this is not true if the X library calls are carefully chosen and scheduled. Clearly, the various X toolkits are too cumbersome and a clearer, leaner approach is called for.

A good way to save time (we always want to do that) in real-time graphical systems is to make fewer calls. If several draws can be calculated for the same GC, then it is much better to fill an array of segments, rectangles and so on, and draw them all with one call.

Where possible, it is better on the eye to avoid erasing whole sections of an application window and redrawing. Instead, it is better to amend the window piecemeal, but this raises problems for rainbow displays when they are zoomed and shifted.

Text indicators that are XFillRectangled, sprintfd and XDrawStringd can be made flicker-free by using XDrawImageString. Remember to set the background colour in the GC.

Using the XOR GC function you can pull some good moves. To erase a part of something. simply redraw that part in the same GC. You can pile up a lot of draws from different XOR GCs and selectively undraw them easily. This saves having to use the background colour to undraw bits of the display.

We don't have time for pesky expose events and clipped redrawing, so it is necessary to use Backing Store for the application and window menus.


I'll add here that the application should never be blocked when awaiting user input. If that happens, the recording overrun messages will soon pile up. To avoid this problem, the main processing loop could like something like this:

    init ();
                while (1)
        check_dialog ();
        acquire_data ();
        transform_data ();
        draw_data ();

The main business of the day--getting back around the main loop before the next buffer is ready--should include a check for any pending dialog requests. These dialogs are in a separate thread and are almost always blocked unless the user intervenes. This means that the dialogs consume few to no clock cycles unless they are activated.

Needless to say, care has to be exercised when mixing threads and global variables. There is no place here for the exorbitant overheads of C++ or Ada, and global shared blocks have to be protected from concurrent access.


We are talking here, of course, about Linux signals, SIGTERM. Many people advise that users avoid mixing X, threads and Linux signals, but this advice is far too restrictive. The best way to cope is to select one thread to service them all. The other threads should block these signals, but remember, you cannot block SIGCONT.


A high quality, extremely fast Fast Fourier Transform (FFT) now is available for Linux. The Fastest Fourier Transform in the West (FFTW) is certainly an improvement on previous efforts and should be considered as a premier FFT library.

One of the best things about FFTW real->complex transforms is the input is not clobbered and actually has quite a nice format. With the exception of the components at 0Hz and 1/2fs (which have null imaginary components), it is simple to compute powers with FFTW by having two pointers, one ascending and one descending.

It goes without saying that pointer arithmetic should be used instead of array subscripting wherever possible. If you are traversing an array, set a pointer to the beginning and increment it with each pass.


GCC is the only choice for Linux. Although other C compilers are available, GCC is the one that the kernel is compiled with by default. There is often (i.e., nearly always) some gain from using optimization. Use

-fomit-frame-pointer -O2 -ffast-math -arch=WHATEVER

and so on for the final build. Of course, development versions should be compiled for debugging and malloc tracing. When linking with ElectricFence, njamd or mpatrol will save users a lot of time and heartache. A good frontend to the GDB debugger is DDD. It also is a good idea to have Vim or Emacs with integrated tag and ID search capabilities. I use Vim's id-utils and ctags.

xspect a High Performance Power Spectrum Analyser

The data for xspect is read from the sound card (/dev/dsp) after it is opened and set with ioctls. The program blocks until all the data is ready, and the data goes into one of two buffers. If overlap mode is selected, data very often will be read from each buffer, and care has to taken to keep all the pointers and block scheduling in sync.

Figure 1. xspect in rainbow graph mode showing sound card ambient analog noise. Vertical scale is in dB.

Whether or not overlap is used, the data found now has the estimated DC error subtracted from it. The DC offset is saved, with all the other user variables, in an INI file for reuse. It also can be re-estimated while the program is running, with the D key.

Figure 2. xspect in single color graph mode showing synchronized asynchronous menu dialogue. It is from the same data as previous shot.

The adjusted data is now multiplied by a Blackman data tapering window and stored in fftin. The coefficients are precalculated and subsume data scaling, FFT length scaling and window power loss, so that the power calculation is simpler and quicker.

The input data is calculated like this:

massaged_data = (raw_sample - dc_error) * windowing_coeff

and the windowing function is:

                coeff[i] = 0.42 - 0.5 * (cos(2 * pi * i / (len - 1))
                   + 0.08 * (cos(4 * pi * i / (len - 1))

Note that the second equation says (len - 1) and not (len). This means both the first and last coefficients are zero and that the table is symmetric about the two middle points. Most textbooks make a serious error in relation to this point.

Now the coefficients are adjusted to account for the data scaling:

coeff = coeff / 32768

The data is signed 16-bit short (-32768 to 32767), and this makes it double (-1 to 1).

The coefficients now are adjusted to account for FFTW normalization:

coeff = coeff / len

and to account for windowing power loss:

coeff = coeff * sqrt(total_power_loss)

After these calculations are made, the windowed data in fftin is passed to rfftw_one, and the transformed data is saved in fftout. Only the first 512 of the available 513 bins are used, which keeps the drawing easier. The power values are put in powerval. The power is calculated with

raw_power = (real*real) + (imaginary*imaginary)

If averaging is selected, the results are adjusted with:

averaged_power = raw_power*alpha + last_power*beta

beta is the averaging_factor divided by 100. So if the averaging_factor is 50%, then beta is 0.5. alpha, then, is 1 - beta.

You also need an array of last_powers for your work, which need to be initialized (zeroed) at the start of the program and whenever the user changes parameters.

Peak detection is carried out during this stage. The peak power bin, hence the frequency, and power value are stored then, provided that peak_mode is on.

Next, the relevant values from powerval are chosen for the graph. If there is no X zoom in operation, there are 512 lines to draw. Otherwise, there are nbins rectangles to draw, starting at firstbin.

If the power value is non-zero, we calculate a Y value for it; otherwise set Y to WINDOW_BOTTOM. The calculation involves taking logs, scaling and applying an offset:

Y = WINDOW_TOP - (log10(power) * yscale) - yoffset

yscale is the number of pixels per Bel:

yscale = Y_pixels_in_graph / Y_Bels_in_graph

yoffset is the number of pixels from 0dB to WINDOW_TOP:

yoffset = yscale * graph_top_Bel_value

Y is then clamped (saturation arithmetic) to lie between WINDOW_TOP and WINDOW_BOTTOM.

Now, if Y is not equal to lasty, we draw the deficit or undraw the excess using XOR GC. If the graph is monocoloured, we use XDrawSegments or XDrawRectangles, saving oodles of X lib calls. No lines, rectangles or text are ever redrawn unless necessary.

The basis of the menu system is the concept of the synchronized asynchronous dialog. With this, another thread handles the menus and keypresses, and when a dialog is complete, it flags a request for action in the main thread.

Clearly, the frequency of the main loop (the frame rate) determines the response time of the dialog. It is good to be in the 30+ frames per second zone. This range limits the delay between the music and the graphics.

The menus are efficient with clock cycles, but they use a fair bit of memory because they need backing store. This would be a complicated proposition in Motif, for example. In xspect, the are calculated to look and feel like FVWM/Motif, and they can be fairly easily modified.

The dialog thread handles all external Linux signals except SIGCONT, which is handled by all threads.

On receipt of the usual user termination signals

   SIGTERM (e.g., kill command)
   SIGINT  (e.g., Ctrl-c in the shell)
   SIGQUIT (e.g., Ctrl-\ in the shell)
   SIGHUP  (e.g., log off)

the dialog thread posts a do_quit request and then exits.

On receipt of the suspend signal

SIGTSTP (e.g., Ctrl-z in the shell)

the dialog thread posts a do_toggle_freeze request and goes back to sleep, as long as the graph is not already frozen. The important point is the application goes to sleep and consumes almost no CPU time.

On receipt of the continue signal

SIGCONT (e.g., the fg command)

the dialog thread posts a do_toggle_freeze request and goes back to sleep (as long as the graph is frozen). The application then resumes operation.

But how is SIGCONT handled correctly since it cannot be blocked by any of the threads? The answer is all the threads handle it, but the defrost is only carried out once due to careful coding.

Gordon Miller is an IT trainer from Edinburgh, Scotland. He has been developing Linux applications for six years. The Merlin DSP project is his work.

Load Disqus comments