Introduction to Sound Programming with ALSA

October 1st, 2004 by Jeff Tranter in

Make maximum use of all the functionality in the new 2.6 kernel sound architecture using a simple API.
Your rating: None Average: 4.4 (37 votes)

ALSA stands for the Advanced Linux Sound Architecture. It consists of a set of kernel drivers, an application programming interface (API) library and utility programs for supporting sound under Linux. In this article, I present a brief overview of the ALSA Project and its software components. The focus is on programming the PCM interfaces of ALSA, including programming examples with which you can experiment.

You may want to explore ALSA simply because it is new, but it is not the only sound API available. ALSA is a good choice if you are performing low-level audio functions for maximum control and performance or want to make use of special features not supported by other sound APIs. If you already have written an audio application, you may want to add native support for the ALSA sound drivers. If your primary interest isn't audio and you simply want to play sound files, using one of the higher-level sound toolkits, such as SDL, OpenAL or those provided in desktop environments, may be a better choice. By using ALSA you are restricted to using systems running a Linux kernel with ALSA support.

History of ALSA

The ALSA Project was started because the sound drivers in the Linux kernel (OSS/Free drivers) were not being maintained actively and were lagging behind the capabilities of new sound technology. Jaroslav Kysela, who previously had written a sound card driver, started the project. Over time, more developers joined, support for many sound cards was added and the structure of the API was refined.

During development of the 2.5 series of Linux kernel, ALSA was merged into the official kernel source. With the release of the 2.6 kernel, ALSA will be part of the stable Linux kernel and should be in wide use.

Digital Audio Basics

Sound, consisting of waves of varying air pressure, is converted to its electrical form by a transducer, such as a microphone. An analog-to-digital converter (ADC) converts the analog voltages into discrete values, called samples, at regular intervals in time, known as the sampling rate. By sending the samples to a digital-to-analog converter and an output transducer, such as a loudspeaker, the original sound can be reproduced.

The size of the samples, expressed in bits, is one factor that determines how accurately the sound is represented in digital form. The other major factor affecting sound quality is the sampling rate. The Nyquist Theorem states that the highest frequency that can be represented accurately is at most one-half the sampling rate.

ALSA Basics

ALSA consists of a series of kernel device drivers for many different sound cards, and it also provides an API library, libasound. Application developers are encouraged to program using the library API and not the kernel interface. The library provides a higher-level and more developer-friendly programming interface along with a logical naming of devices so that developers do not need to be aware of low-level details such as device files.

In contrast, OSS/Free drivers are programmed at the kernel system call level and require the developer to specify device filenames and perform many functions using ioctl calls. For backward compatibility, ALSA provides kernel modules that emulate the OSS/Free sound drivers, so most existing sound applications continue to run unchanged. An emulation wrapper library, libaoss, is available to emulate the OSS/Free API without kernel modules.

ALSA has a capability called plugins that allows extension to new devices, including virtual devices implemented entirely in software. ALSA provides a number of command-line utilities, including a mixer, sound file player and tools for controlling special features of specific sound cards.

ALSA Architecture

The ALSA API can be broken down into the major interfaces it supports:

  • Control interface: a general-purpose facility for managing registers of sound cards and querying the available devices.

  • PCM interface: the interface for managing digital audio capture and playback. The rest of this article focuses on this interface, as it is the one most commonly used for digital audio applications.

  • Raw MIDI interface: supports MIDI (Musical Instrument Digital Interface), a standard for electronic musical instruments. This API provides access to a MIDI bus on a sound card. The raw interface works directly with the MIDI events, and the programmer is responsible for managing the protocol and timing.

  • Timer interface: provides access to timing hardware on sound cards used for synchronizing sound events.

  • Sequencer interface: a higher-level interface for MIDI programming and sound synthesis than the raw MIDI interface. It handles much of the MIDI protocol and timing.

  • Mixer interface: controls the devices on sound cards that route signals and control volume levels. It is built on top of the control interface.

Device Naming

The library API works with logical device names rather than device files. The device names can be real hardware devices or plugins. Hardware devices use the format hw:i,j, where i is the card number and j is the device on that card. The first sound device is hw:0,0. The alias default refers to the first sound device and is used in all of the examples in this article. Plugins use other unique names; plughw:, for example, is a plugin that provides access to the hardware device but provides features, such as sampling rate conversion, in software for hardware that does not directly support it. The dmix and dshare plugins allow you to downmix several streams and split a single stream dynamically among different applications.

Sound Buffers and Data Transfer

A sound card has a hardware buffer that stores recorded samples. When the buffer is sufficiently full, it generates an interrupt. The kernel sound driver then uses direct memory access (DMA) to transfer samples to an application buffer in memory. Similarly, for playback, another application buffer is transferred from memory to the sound card's hardware buffer using DMA.

These hardware buffers are ring buffers, meaning the data wraps back to the start when the end of the buffer is reached. A pointer is maintained to keep track of the current positions in both the hardware buffer and the application buffer. Outside of the kernel, only the application buffer is of interest, so from here on we discuss only the application buffer.

The size of the buffer can be programmed by ALSA library calls. The buffer can be quite large, and transferring it in one operation could result in unacceptable delays, called latency. To solve this, ALSA splits the buffer up into a series of periods (called fragments in OSS/Free) and transfers the data in units of a period.

A period stores frames, each of which contains the samples captured at one point in time. For a stereo device, the frame would contain samples for two channels. Figure 1 illustrates the breakdown of a buffer into periods, frames and samples with some hypothetical values. Here, left and right channel information is stored alternately within a frame; this is called interleaved mode. A non-interleaved mode, where all the sample data for one channel is stored followed by the data for the next channel, also is supported.

Over and Under Run

When a sound device is active, data is transferred continuously between the hardware and application buffers. In the case of data capture (recording), if the application does not read the data in the buffer rapidly enough, the circular buffer is overwritten with new data. The resulting data loss is known as overrun. During playback, if the application does not pass data into the buffer quickly enough, it becomes starved for data, resulting in an error called underrun. The ALSA documentation sometimes refers to both of these conditions using the term XRUN. Properly designed applications can minimize XRUN and recover if it occurs.

A Typical Sound Application

Programs that use the PCM interface generally follow this pseudo-code:

open interface for capture or playback
set hardware parameters
(access mode, data format, channels, rate, etc.)
while there is data to be processed:
   read PCM data (capture)
   or write PCM data (playback)
close interface

We look at some working code in the following sections. I recommend you compile and run these on your Linux system, look at the output and try some of the suggested modifications. The full listings for the example programs that accompany this article are available for download from ftp.ssc.com/pub/lj/listings/issue126/6735.tgz.

Listing 1 displays some of the PCM data types and parameters used by ALSA. The first requirement is to include the header file that brings in the definitions for all of the ALSA library functions. One of the definitions is the version of ALSA, which is displayed.

The remainder of the program iterates through a number of PCM data types, starting with the stream types. ALSA provides symbolic names for the last enumerated value and a utility function that returns a descriptive string for a value. As you can see in the output, ALSA supports many different data formats, 38 for the version of ALSA on my system.

The program must be linked with the ALSA library, libasound, to run. Typically, you would add the option -lasound on the linker command line. Some ALSA library functions use the dlopen function and floating-point operations, so you also may need to add -ldl and -lm.

Listing 2 opens the default PCM device, sets some parameters and then displays the values of most of the hardware parameters. It does not perform any sound playback or recording. The call to snd_pcm_open opens the default PCM device and sets the access mode to PLAYBACK. This function returns a handle in the first function argument that is used in subsequent calls to manipulate the PCM stream. Like most ALSA library calls, the function returns an integer return status, a negative value indicating an error condition. In this case, we check the return code; if it indicates failure, we display the error message using the snd_strerror function and exit. In the interest of clarity, I have omitted most of the error checking from the example programs. In a production application, one should check the return code of every API call and provide appropriate error handling.

In order to set the hardware parameters for the stream, we need to allocate a variable of type snd_pcm_hw_params_t. We do this with the macro snd_pcm_hw_params_alloca. Next, we initialize the variable using the function snd_pcm_hw_params_any, passing the previously opened PCM stream.

We now set the desired hardware parameters using API calls that take the PCM stream handle, the hardware parameters structure and the parameter value. We set the stream to interleaved mode, 16-bit sample size, 2 channels and a 44,100 bps sampling rate. In the case of the sampling rate, sound hardware is not always able to support every sampling rate exactly. We use the function snd_pcm_hw_params_set_rate_near to request the nearest supported sampling rate to the requested value. The hardware parameters are not actually made active until we call the function snd_pcm_hw_params.

The rest of the program obtains and displays a number of the PCM stream parameters, including the period and buffer sizes. The results displayed vary somewhat depending on the sound hardware.

After running the program on your system, experiment and make some changes. Change the device name from default to hw:0,0 or plughw: and see whether the results change. Change the hardware parameter values and observe how the displayed results change.

Listing 3 extends the previous example by writing sound samples to the sound card to produce playback. In this case we read bytes from standard input, enough for one period, and write them to the sound card until five seconds of data has been transferred.

The beginning of the program is the same as in the previous example—the PCM device is opened and the hardware parameters are set. We use the period size chosen by ALSA and make this the size of our buffer for storing samples. We then find out that period time so we can calculate how many periods the program should process in order to run for five seconds.

In the loop that manages data, we read from standard input and fill our buffer with one period of samples. We check for and handle errors resulting from reaching the end of file or reading a different number of bytes from what was expected.

To send data to the PCM device, we use the snd_pcm_writei call. It operates much like the kernel write system call, except that the size is specified in frames. We check the return code for a number of error conditions. A return code of EPIPE indicates that underrun occurred, which causes the PCM stream to go into the XRUN state and stop processing data. The standard method to recover from this state is to use the snd_pcm_prepare function call to put the stream in the PREPARED state so it can start again the next time we write data to the stream. If we receive a different error result, we display the error code and continue. Finally, if the number of frames written is not what was expected, we display an error message.

The program loops until five seconds' worth of frames has been transferred or end of file read occurs on the input. We then call snd_pcm_drain to allow any pending sound samples to be transferred, then close the stream. We free the dynamically allocated buffer and exit.

We should see that the program is not useful unless the input is redirected to something other than a console. Try running it with the device /dev/urandom, which produces random data, like this:


./example3 < /dev/urandom

The random data should produce white noise for five seconds.

Next, try redirecting the input to /dev/null or /dev/zero and compare the results. Change some parameters, such as the sampling rate and data format, and see how it affects the results.

Listing 4 is much like Listing 3, except that we perform PCM capture (recording). When we open the PCM stream, we specify the mode as SND_PCM_STREAM_CAPTURE. In the main processing loop, we read the samples from the sound hardware using snd_pcm_readi and write it to standard output using write. We check for overrun and handle it in the same manner as we did underrun in Listing 3.

Running Listing 4 records approximately five seconds of data and sends it to standard out; you should redirect it to a file. If you have a microphone connected to your sound card, use a mixer program to set the recording source and level. Alternatively, you can run a CD player program and set the recording source to CD. Try running Listing 4 and redirecting the output to a file. You then can run Listing 3 to play back the data:


./listing4 > sound.raw
./listing3 < sound.raw

If your sound card supports full duplex sound, you should be able to pipe the programs together and hear the recorded sound coming out of the sound card by typing: ./listing4 | ./listing3. By changing the PCM parameters you can experiment with the effect of sampling rates and formats.

Advanced Features

In the previous examples, the PCM streams were operating in blocking mode, that is, the calls would not return until the data had been transferred. In an interactive event-driven application, this situation could lock up the application for unacceptably long periods of time. ALSA allows opening a stream in nonblocking mode where the read and write functions return immediately. If data transfers are pending and the calls cannot be processed, ALSA returns an error code of EBUSY.

Many graphical applications use callbacks to handle events. ALSA supports opening a PCM stream in asynchronous mode. This allows registering a callback function to be called when a period of sample data has been transferred.

The snd_pcm_readi and snd_pcm_writei calls used here are similar to the Linux read and write system calls. The letter i indicates that the frames are interleaved; corresponding functions exist for non-interleaved mode. Many devices under Linux also support the mmap system call, which maps them into memory where they can be manipulated with pointers. Finally, ALSA supports opening a PCM channel in mmap mode, which allows efficient zero copy access to sound data.

Conclusion

I hope this article has motivated you to try some ALSA programming. As the 2.6 kernel becomes commonly used by Linux distributions, ALSA should become more widely used, and its advanced features should help Linux audio applications move forward.

My thanks to Jaroslav Kysela and Takashi Iwai for reviewing a draft of this article and providing me with useful comments.

Resources for this article: /article/7705.

Jeff Tranter has been using, writing about and contributing to Linux since 1992. He works for Xandros Corporation in Ottawa, Canada.

__________________________


Special Magazine Offer -- Free Gift with Subscription
Receive a free digital copy of Linux Journal's System Administration Special Edition as well as instant online access to current and past issues. CLICK HERE for offer

Linux Journal: delivering readers the advice and inspiration they need to get the most out of their Linux systems since 1994.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Anonymous's picture

Appreciation of the article

On April 16th, 2009 Anonymous (not verified) says:

This is the way articles should be written, it is an introductory note, tutorial, call it whatever, simple terse and clear.

I am sure that there will be some people who will like ALSA because of this article, like me.

MK's picture

5 sec

On April 9th, 2009 MK (not verified) says:

/* 5 seconds in microseconds divided by
* period time */
loops = 5000000 / val;

How did he calculate this?

Shubham's picture

how to read & play a sound file

On February 5th, 2009 Shubham (not verified) says:

hi,

I am new to ALSA programing. Could anyone tell how i can read & play a sound file using above example code for playback?

Anonymous's picture

Converting to ALSA

On January 31st, 2009 Anonymous (not verified) says:

Firstly, this article is really good.

I need to write an application for FXS interface. the drivers are implements using ioctl system calls. How difficult is it to convert to ALSA API.
Or the otherway, what is to be done if i need to access these drivers in an application which is already supporting ALSA.

Elie's picture

ok I just found it. for code

On December 30th, 2008 Elie (not verified) says:

ok I just found it. for code called 'alsa.c' it's

gcc alsa.c -lasound

Elie's picture

Linking Libraries

On December 29th, 2008 Elie (not verified) says:

Umm... what libraries do i need to link? I tried the code (listings 3 & 4) and it compiled but didn't link. What gcc parameters need to be added?

(currently using Ubuntu Feisty 7.04)

mark manning's picture

undocumented bs wtf!

On October 16th, 2008 mark manning (not verified) says:

your second piece of example code has the following funciton call..

rc = snd_pcm_open(&handle, "default", SND_PCM_STREAM_PLAYBACK, 0);

looking at the API reference for this call we see that the last parameter in this call is int mode. Mode may be one of the following values.

SND_PCM_NONBLOCK or SND_PCM_ASYNC. The first of these is equal to 0x01 and the second is equal to 0x02. WHAT is this undocumented mode zero that your example code AND the alsa example code uses.

undocumented == BAD MOJO!

great article tho :)

Anonymous's picture

i am working on a project

On October 9th, 2008 Anonymous (not verified) says:

i am working on a project that needs a simple voice recording to be saved to a file, before further processing can be done on it... i am new to alsa (sound programming in general).
i believe alsa and oss can accomplish what i want but i need further guidance.
thanks.

Santoshkumar's picture

ALSA Duplex Working code

On September 26th, 2008 Santoshkumar (not verified) says:

Does any body tested ALSA duplex (Record and playback)? Can you please point the place where i can get some reference source code?

Thank you,
Santosh
santosh.pattar@gmail.com

William Estrada's picture

Updated code for FC6?

On September 14th, 2007 William Estrada (not verified) says:

I read this article and tried to run the sample code. It did compile and run but the record program produces garbage/noise. Is there an update for a Fedora Core 6 kernel? What kind of sound file does this code produce, wav, au, etc?

Anonymous's picture

I don't know if anyone is

On July 25th, 2007 Anonymous (not verified) says:

I don't know if anyone is reading this.

But there are a few problems.

It records as far as I can see, but I get a lot of random garbage data which I don't want when recording. Anything I record is messed up in random data.

David's picture

I´ve tried Sound

On December 20th, 2006 David (not verified) says:

I´ve tried Sound Programming with ALSA in a seminar, but i´m afraid, it is just too difficult for me. But i have to point out that i´m progging since one year, so i think i need just a bit more time. patience is all :-)

Fabrice Pardo's picture

in listing 2, dir should be initialized

On December 13th, 2006 Fabrice Pardo (not verified) says:

Very useful article.

In listing 2, you should replace
int dir;
by
int dir = 0;
or replace &dir by NULL later

In my first test case, dir was randomly initialized to a negative
value, giving as a result
rate = 44099 bps

Actually, the snd_pcm_hw_params_set_rate_near function is poorly documented.

MikeW's picture

Display format

On September 22nd, 2005 MikeW (not verified) says:

It would be nice if the (fixed) page width were a little greater, without having to shrink the browser text size.

Hard to read code that's full of line wraps.

Nagaraja S's picture

Problem in opening default device in listing2 of this article

On March 11th, 2005 Nagaraja S (not verified) says:

Hi guys am new to this ALSA, I downloaded all driver, library utils from www.alsa-project.com and i installed.

when i run the listing1 of this doc it went fine and when i tried to run the second listing it says like this.
Unable to open the default device No such file.And i am using slackware with linux 2.4.22
Please help me out guys.

Thanx in advance for help

JKCunningham's picture

Re: Introduction to Sound Programming with ALSA

On September 29th, 2004 JKCunningham (not verified) says:

Great article. I tried the first source and it worked fine, but the second bombed:

> ./listing_2
unable to set hw parameters: Invalid argument

Any idea why this would be? I didn't modify the source.

-Jeff

Nagaraja's picture

Problem in listing2 program Unable to open default device

On March 11th, 2005 Nagaraja (not verified) says:

Hi guys am new to this ALSA, I downloaded all driver, library utils from www.alsa-project.com and i installed.

when i run the listing1 of this doc it went fine and when i tried to run the second listing it says like this.
Unable to open the default device No such file.And i am using slackware with linux 2.4.22
Please help me out guys.

Thanx in advance for help

Pablo's picture

Re: Introduction to Sound Programming with ALSA

On November 27th, 2004 Pablo (not verified) says:

It happened the same to me. When I change the sampling frequency value to 48 kHz then I got the error message:
unable to set hw parameters: Invalid argument
For any other sampling frequency it works ok.

Pablo

Pablo's picture

It happened the same to me. W

On November 27th, 2004 Pablo (not verified) says:

It happened the same to me. When I change the sampling frequency value to 48 kHz then I got the error message:
unable to set hw parameters: Invalid argument
For any other sampling frequency it works ok.

Pablo

Anonymous's picture

Re: Introduction to Sound Programming with ALSA

On October 1st, 2004 Anonymous says:

Someone has posted a note at the Resources page
,
saying that the variable "dir" needs to be initialized to 0 (zero).
That seemed to solve my problems, at least.

Anonymous's picture

Re: Introduction to Sound Programming with ALSA

On October 1st, 2004 Anonymous says:

I am having the same problems with listing 4. (And it seems that others have too, see the Resources page.) In my case the problem seems to be the sampling frequency, if I change 44100 til 88200 the program will run.

What I really would like to know is whether this is a problem with the example program from the article (I did not modify it), a problem with ALSA (I use Debian Sarge with 2.6-kernel) or a problem with my sound card (SoundBlaster Live).

Anonymous's picture

Re: Introduction to Sound Programming with ALSA

On September 24th, 2004 Anonymous says:

A simple question: Why can't the ALSA developers make sure that ALSA provides sound support at least as good as OSS? After having tested it on a number of boards, it was always a pain in the neck to configure properly, and getting it to work, an adventure. I will stay with OSS until ALSA becomes much more user-friendly.

Anonymous's picture

Re: Introduction to Sound Programming with ALSA

On September 22nd, 2004 Anonymous says:

Excellent informatiom. Now lets have a simple real time mixer with chnangeable effect to really show what ALSA + 2.6 Linux can do

Anonymous's picture

Re: Introduction to Sound Programming with ALSA

On September 15th, 2004 Anonymous says:

Yes, I agree, excellent article and a brilliant introduction to ALSA.

Keep up the good work!

Anonymous's picture

Re: Introduction to Sound Programming with ALSA

On September 9th, 2004 Anonymous says:

Informative article. More intros to popular or obscure libraries please!

Anonymous's picture

Where is figure 1? As stated

On December 6th, 2006 Anonymous (not verified) says:

Where is figure 1? As stated by the document figure shows the breakdown of periods, frames etc.

Post new comment

Please note that comments may not appear immediately, so there is no need to repost your comment.
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <i> <b>
  • Lines and paragraphs break automatically.

More information about formatting options

Newsletter

Each week Linux Journal editors will tell you what's hot in the world of Linux. You will receive late breaking news, technical tips and tricks, and links to in-depth stories featured on www.linuxjournal.com.
Sign up for our Email Newsletter

Tech Tip Videos

From the Magazine

July 2009, #183

News Flash: Linux Kernel 3.0 to include an on-the-go Expresso machine interface! Ok, maybe not, but Linux is definitely going mobile, from phones to e-readers. Find out more inside about Android, the Kindle 2, the Western Digital MyBook II, The Bug, and Indamixx (a portable recording studio). And if you've gone mobile and you been wanting more Emacs in your life then check out Conkeror.


To compliment the mobile we've got the stationary: parsing command line options with getopt, checking your Ruby code with metric_fu, and building a secure Squid proxy. How is this stationary you ask? What can we say? It's not. We just wanted to see if anybody actually read this part of the page :) .


All this and more, and all you have to do is get your hot sweaty hands on the latest copy of Linux Journal.





Read this issue