Linux.com

What is Linux?

Feeds

Special Offers

Feature

An introduction to Linux sound systems and APIs

By Christian Vincenot on August 09, 2004 (8:00:00 AM)

When coding a program, one of the best ways to show users that an event has happened is to produce sounds. That's why sound is now present in almost every program. Every operating system has different sound systems and APIs to access the sound card, so that no low-level coding is required to use the sound device. Programmers have many different choices concerning which system to use, especially under Linux -- and maybe that's the problem. This article will illustrate free sound architectures under Linux, as well as the different interfaces a programmer can use.

Kernel sound drivers: OSS and ALSA

The most direct way is to talk to the kernel sound drivers. Linux has two:

Open Sound System

Open Sound System (OSS) comes in two versions: OSS/Free, which is a free software maintained by the well-known kernel hacker Alan Cox, and 4Front Technologies' OSS (OSS/Linux, formerly known as VoxWare, USS, and TASD), which is a proprietary implementation based on OSS/Free. OSS is available not only for Linux but also for BSD OSes and other Unixes. That may be its only advantage, because this system is not very powerful and was officially replaced by ALSA in 2.5 kernels.

I'm not going to talk about programming for OSS, considering that it is deprecated, but it is not very difficult (to sum up, open /dev/dsp, /dev/dspW, or /dev/audio depending on the format you want, manipulate the file descriptor to read and write to the sound card, and use some ioctl to set parameters like volume). You can learn about advanced OSS programming in 4Front's API specs.

ALSA

Advanced Linux Sound Architecture (ALSA) is the new Linux sound hardware abstraction layer that replaces OSS. In fact, it's more than a simple HAL because it provides a user-space library named libasound. What's more, it's thread-safe, works well with SMP machines, and is backward-compatible with OSS/Free (using OSS emulation module). Of course, it's also free and open source. A full description of its features and API can be found on ALSA's Web site, and I would also suggest reading Paul Davis' Tutorial.

Let's take a look at ALSA's API with a little example that will show the good and bad points of ALSA:

/* Example stolen from Paul Davis' tutorial (don't worry, he won't sue me -- GPL privilege)
 * Have just omitted the error handling for concision and added comments */

#include <stdio.h&gt
#include <stdlib.h&gt
#include <alsa/asoundlib.h&gt

main (int argc, char *argv[])
{
	int i;
	int err;
	short buf[128];
	snd_pcm_t *playback_handle;
	snd_pcm_hw_params_t *hw_params;

	/* Open the device */
	snd_pcm_open (&playback_handle, argv[1], SND_PCM_STREAM_PLAYBACK, 0);

	/* Allocate Hardware Parameters structures and fills it with config space for PCM */
	snd_pcm_hw_params_malloc (&hw_params);
	snd_pcm_hw_params_any (playback_handle, hw_params);

	/* Set parameters : interleaved channels, 16 bits little endian, 44100Hz, 2 channels */
	snd_pcm_hw_params_set_access (playback_handle, hw_params, SND_PCM_ACCESS_RW_INTERLEAVED);
	snd_pcm_hw_params_set_format (playback_handle, hw_params, SND_PCM_FORMAT_S16_LE);
	snd_pcm_hw_params_set_rate_near (playback_handle, hw_params, 44100, 0);
	snd_pcm_hw_params_set_channels (playback_handle, hw_params, 2);

	/* Assign them to the playback handle and free the parameters structure */
	snd_pcm_hw_params (playback_handle, hw_params);
	snd_pcm_hw_params_free (hw_params);

	/* Prepare & Play */
	snd_pcm_prepare (playback_handle);
	for (i = 0; i < 10; i++) {
		if ((err = snd_pcm_writei (playback_handle, buf, 128)) != 128) {
			(...)
		}
	}

	/* Close the handle and exit */
	snd_pcm_close (playback_handle);
	exit (0);
}

As you can see, the API is quite clear and not very hard to understand, even if it's a bit long. ALSA acts at a level low enough for the programmer to be able to chose another design called interrupt-driven or callback-driven, which is fundamentally better because:

There is no blocking on reads/writes.
The application is "driven" by the callbacks and can continue to run.
The code is easily portable to other sound systems.

The low-level capabilities of ALSA make it a powerful system, but code becomes very tricky when it comes to the full duplex with callbacks method, for which other sound systems may be preferrable. (Even the ALSA Audio API Tutorial advises using JACK for full duplex.)

About the practical use of Kernel Drivers...

Besides the full duplex difficulty, another problem for ALSA multimedia applications is what motivated the creation of sound servers: such programs need concurrent access to the sound card, and it's not possible to have only one application be able to produce and capture sound at a time. Practically, designers must determine at which level a program should act: If it needs low-level access, ALSA can be a good solution, but if sound is not the main part of the project or if high-level operations are needed, consider instead the sound systems we'll talk about next.

Sound servers

Sound servers are software that sit atop the audio core and put one more layer between the user and the hardware. The act of talking to the kernel's audio API comes with a little performance hit but results in a simpler API which enables applications to do software-based sample mixing. Software-based sample mixing enables applications to play multiple sounds at the same time on a single sound card without needing one a sound card that natively supports that. With it, applications can share the sound hardware, because sound servers support multiple channels (kernel sound servers support only one) by multiplexing and streaming the result to /dev/dsp. Some sound servers (esd, aRTs, NAS) are also built on a client/server model that enable sound to be played remotely and transparently on a network: this is called network transparency. If you want sound servers with such features, take a look at the Squeak homepage.

ESD

ESD, short for Enlightenment Sound Daemon, was originally developed for Enlightenment and is now part of the GNOME Project. ESoundD supports full duplex and network transparency, and is especially suited for sound effects and long unsynchronized music. You can extract the API from source: esd.h and esdlib.c. Compile with gcc -o esdtest esdtest.c `esd-config --cflags --libs`.

/* Let's see a skeleton that a recording program can change */
#include <stdio.h> /* for NULL */
#include "esd.h"

int main()
{
	char buf[ESD_BUF_SIZE];
	int sock = -1;

	/* Set format : 16bits stereo stream for recording */
	esd_format_t format = ESD_BITS16 | ESD_STEREO | ESD_STREAM | ESD_RECORD;

	/* And only 1 command to open the recording :) with the format defined earlier,
	 * ESD's default rate (ESD_DEFAULT_RATE -> 44100Hz),
	 * on localhost:16001 (default -> NULL), and  with "testprog" as internal name */
	sock = esd_record_stream_fallback(format, ESD_DEFAULT_RATE, NULL, "testprog");
	if (sock <= 0) return 1;

	/* And now treat that */
	while (read(sock, buf, ESD_BUF_SIZE) > 0)
	{
		(...)
	}
	close(sock);
	return 0;
}

Piece of cake, isn't it? The esd_record_stream function is in fact a wrapper that calls esd_open_sound(hostname) to connect to the server, negotiate with it, then sets the socket buffers size with esd_set_socket_buffers(sock, format, rate, 44100). The _fallback functions fall back to ALSA and OSS to try to play the sound if ESD fails, which is quite useful.

aRTs

The Analog RealTime Synthesizer is KDE's sound server. Support is progressively fading for it and it's probable that it will be abandoned in the future in favor of JACK. Nevertheless, various commentaries suggest that aRTs has better sound quality than ESD due to better sound processing routines (but higher latency too due to their complexity). aRTs also supports full duplex (but has been reported to be a bit buggy in this area) and network transparency and works on BSD operating systems. Documentation about the aRTs C API is quite rare (see the aRTs project Web site for a little page about it) so the best thing to do is to take a look at the source (artsc.h).

Here's a little example to compile with gcc -o artstest artstest.c `artsc-config --cflags` `artsc-config --libs`.

#include <stdio.h>
#include <artsc.h>

int main()
{
    arts_stream_t stream;
    char buffer[8192];
    int bytes;
    int errorcode;

    /* Initialise aRTs with arts_init() */
	if ((errorcode = arts_init()) < 0)
    {
        fprintf(stderr, "arts_init error: %s\n", arts_error_text(errorcode));
        return 1;
    }

    /* Open a stream for playback at 44100Hz, 16 bits, 2 channels as "aRTstest" */
	stream = arts_play_stream(44100, 16, 2, "aRTstest");

    /* example of treatment : read music from stdin and play it with arts_write */
	while((bytes = fread(buffer, 1, 8192, stdin)) > 0)
    {
        if ((errorcode = arts_write(stream, buffer, bytes)) < 0)
        {
            fprintf(stderr, "arts_write error: %s\n", arts_error_text(errorcode));
            return 1;
        }
    }

    /* Does what it says */
    arts_close_stream(stream);
    arts_free();

    return 0;
}

The API is also very simple, as you can see. Some other useful commands include arts_suspend, to free the DSP device for aRTs-incapable programs to access it, and arts_stream_set, to configure some stream parameters.

JACK

JACK (also called JACKit) follows the long tradition of recursive acronyms -- in this case, Jack Audio Connection Kit. This project was created as an implementation of the Linux Audio Applications Glue API project, which aimed at creating a high-bandwidth, low-latency inter-application communication API. It is a real-time sound server written for POSIX systems (and actually available for Linux and OS X) that enables different applications to have synchronous connections to the audio hardware and to share audio among themselves via a ports system. Programs can run as normal independent applications or as plugins within the JACK server. It uses the callback method shown earlier, implements ringbuffers, and is, in my humble opinion, the most excellent and promising sound server. The only bad point is that it is not widely available at the moment, but that should be fixed soon. (Gentoo already includes it and there are some third party RPMs for Fedora Core.) The API is well-documented and available on SourceForge. A fully documented example for a capture client is available on Berman Home Page.

Here we'll start with something softer and smaller. Compilation is done using gcc -o jacktest `pkg-config --cflags --libs jack` jacktest.c.

/* Lighter version of simple_client.c */

#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <jack/jack.h>

jack_port_t *input_port;
jack_port_t *output_port;

/* Processing thread: only transmit the data from input to output */
int process (jack_nframes_t nframes, void *arg)
{
	jack_default_audio_sample_t *out = (jack_default_audio_sample_t *) jack_port_get_buffer (output_port, nframes);
	jack_default_audio_sample_t *in = (jack_default_audio_sample_t *) jack_port_get_buffer (input_port, nframes);

	memcpy (out, in, sizeof (jack_default_audio_sample_t) * nframes);

	return 0;
}

void jack_shutdown (void *arg)
{
	exit (1);
}

int main ()
{
	jack_client_t *client;

	/* try to become a client of the JACK server */

	if ((client = jack_client_new ("test_client") == 0) {
		fprintf (stderr, "jack server not running?\n");
		return 1;
	}

	/* tell the JACK server to call `process()' whenever there is work to be done. */
	jack_set_process_callback (client, process, 0);

	/* tell the JACK server to call `jack_shutdown()' if it ever shuts down, either entirely, or if it
	   just decides to stop calling us. */
	jack_on_shutdown (client, jack_shutdown, 0);

	/* display the current sample rate. once the client is activated  */
	printf ("engine sample rate: %lu\n", jack_get_sample_rate (client));

	/* create two ports: 1 input & 1 output*/
	input_port = jack_port_register (client, "input", JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0);
	output_port = jack_port_register (client, "output", JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0);

	/* tell the JACK server that we are ready to roll */
	if (jack_activate (client)) {
		fprintf (stderr, "cannot activate client");
		return 1;
	}

	/* connect the ports: input one to the first ALSA PCM input and output to the first ALSA PCM output */
	if (jack_connect (client, "alsa_pcm:in_1", jack_port_name (input_port))) {
		fprintf (stderr, "cannot connect input ports\n");
	}

	if (jack_connect (client, jack_port_name (output_port), "alsa_pcm:out_1")) {
		fprintf (stderr, "cannot connect output ports\n");
	}

	/* Since this is just a toy, run for a few seconds, then finish */
	sleep (10);
	jack_client_close (client);
	exit (0);
}

This code looks a bit more complex. To understand it, you must think of JACK as a big and complex switchboard with inputs and outputs and on which you can interconnect devices (microphone, sound card, programs, etc.) by plugging them into it. The program copies what's connected as input to what's connected as output (meaning, generally speaking, a wire or cable). This explanation is meant to be a simple example. If you want a complete analysis, go to dis-dot-dat.net.

All the handling is done in the callback and the main program flow is still running (that's why we've used sleep(10)). By the way, there are some real-time considerations when implementing the callback, like using non-blocking and deterministic calls only (malloc, printf, mutex_*, etc., must be banned). Here we're hard-coding a connection between our created output_port and alsa_pcm:out_1, but if you need something more flexible, an interesting function is jack_get_ports (client, NULL, NULL, JackPortIsPhysical|JackPortIsOutput), which for example gets a list of physical output ports available.

Practical implementation

So many APIs -- what now? What should a programmer wanting to use sound choose?

For somebody who wants to code a sound server or have a direct access to sound, ALSA is the obvious choice.
If you're sure that your program will run on only one destkop environment and will be closely linked to it, then choose ESD (for Enlightenment or GNOME) or aRTs (for KDE).
If your system doesn't need to be portable for the moment, Jack is full of promise.
For multi-systems/OS portability, join us tomorrow for part two of this discussion.

To be continued....

Vincenot has been a Linux user for eight years, and is currently a student at University Louis Pasteur in Strasbourg.

Share Print Comments

Comments

on An introduction to Linux sound systems and APIs

Note: Comments are owned by the poster. We are not responsible for their content.

dmix plugin for ALSA allows concurrent access

Posted by: Administrator on August 10, 2004 01:51 PM

The dmix plugin allows multiple processes to access the soundcard without blocking.
http://alsa.opensrc.org/index.php?page=DmixPlugin
So by making dmix plugin the default ALSA sound output device, all the sound servers will work fine concurrently (when using ALSA) and so will other apps.

Re:dmix plugin for ALSA allows concurrent access

Posted by: Administrator on August 10, 2004 11:26 PM

ALSA's dmix plugin is a pretty severe omission from this article. The fact that he hypes the very new Jack Server, but omits dmix, NAS, and MAS makes me think that he's just trying to promote Jack.

I find all this software trying to define audio layers a bit misguided. I'd like someone to think about what Joe User wants here. The basic thing I want is to be able to listen to CDs or MP3s and still have system alerts come through. Also, being able to play swf/flash plugins without hogging the sound device. Things like network transparency and even audio input/recording are luxury features on a desktop system (and sound *output* is a luxury on a server).

I worked pretty hard to switch to ALSA's dmix about 6 months ago, and have been pretty happy. I just hope that gstreamer defaults to ALSA and KDE moves to gstreamer so I don't get sucked into another painful change.

Re:dmix plugin for ALSA allows concurrent access

Posted by: Administrator on August 11, 2004 02:43 AM

ALSA's dmix plugin is a pretty severe omission from this article.

OK, I omitted dmix (and dsnoop) because I considered it as a plugin and as not strictly part of ALSA. You're right, should have at least mentionned it (but most howtos & docs I read weren't doing so either, except the ALSA one).

The fact that he hypes the very new Jack Server, but omits dmix, NAS, and MAS makes me think that he's just trying to promote Jack.

When it comes to NAS & Gstreamer, I explain my choice at the end of the second part (was supposed to be only one article). For the others, had to make choices for concision (the article was already much too big compared to what I was asked), and had to talk about the *obvious* ones likes aRTs, esd and JACK. My goal wasn't to promote any system, but I must admit that JACK was my personal favourite one. <A HREF="http://www.mediaapplicationserver.net/indexframes.html" title="mediaappli...server.net">SAI's MAS</a mediaappli...server.net> is, as you mentionned, also an interesting project with very interesting features (low latency, Network Transparency, X11 integration, bandwidth measurement, ...) especially for conferencing, but less popular for the moment, and if on the little space I had, I had talked about less known sound systems and not about the ones everybody would await, my mailbox would have exploded now :-P. It's difficult to write something that pleases everyone: I already received multiple mails of people accusing me of preferring one system (always a different one) or even one from a big company director saying I would have an agenda (!!) to take down its product.

So, just to say it once and for all, this article isn't supposed to be exhaustive at all, because it was meant to be an introduction and not a serie of articles, and I'm just a student interested in technology and certainly not here to do any kind of advertisement for one sound system (I'd like to get money from one of them, would be an interesting way of
paying my studies ;), but I'm not on sale). If enough readers show interest in a serie of articles about this subject with more details about each solution, I'd
be glad to write it (if the editor is ok with that).

Cheers,

Christian Vincenot

OSS' API is not dead

Posted by: Administrator on August 11, 2004 11:59 PM

Just wanted to add something that maybe wasn't clear enough in my article:

I'm not going to talk about programming for OSS, considering that it is deprecated

I didn't mean by that that OSS' API is dead or uninteresting, but just that OSS/Free has been marked as deprecated in linux kernel development and was replaced by ALSA, and that OSS/Linux is a proprietary software so out of the scope of this article. All this makes that OSS is deprecated as free sound system under LINUX, but its API surely isn't: OSS is still being developped and improved by 4Front Tech for those not minding to give a bit of money to support it (don't flame me, it's not free software but it's not like giving money to a huge Redmond Corporation ;D, and 4Front is XMMS too), and is available on many Unixes other than Linux.

api

Posted by: Administrator on August 14, 2004 07:10 PM

Compilation must be done with "-lasound" to link the binary to alsa-lib. For example:
gcc test.c -o test -lasound

I tried all the devices in /dev/snd/, /dev/dsp, etc.

That way of describing devices in the "everything is a file" way is only used by OSS !! ALSA uses such a notation: type:snd_card,device with:

type which can be hw or plughw,

snd_card: number of the card

device: device number

ex: plughw:0,0. See the <A HREF="http://www.suse.de/~mana/alsa090_howto.html" title="www.suse.de">ALSA Howto for more informations.</a www.suse.de>

This article wasn't meant as an introduction to ALSA: such things are general ALSA knowledge and the goal of this article wasn't to present ALSA or its concepts but to give an overview of APIs for programmers to be able to choose the best solution. What's more, I think I gave enough links for newbies to understand what's going on (the ALSA Howto you say you've found was the first documentation given in the "ALSA's website" link of this article :-/) and for others to learn more.

Last thing, your code modifications are right, BUT verify your sources before qualifying articles of being poor and saying code hasn't been checked !! Paul Davis is nothing less than one of the most active sound developers of the community, works on ALSA and has created the entire LAAGA project as well as JACK (for which he got an <A HREF="http://builder.com.com/5100-6375-5136755.html?tag=tt" title="com.com">Open Source Award this year</a com.com>). The point is that ALSA is still under heavy development and that its API changes a lot, and that's what happened for the snd_pcm_hw_params_set_rate_near function which prototype changed as you can verify on this <A HREF="http://www.music.columbia.edu/pipermail/linux-audio-dev/2003-December/005779.html" title="columbia.edu">linux-audio-dev message</a columbia.edu>. This change provocated the segfault of many programs using ALSA (IceCast, aplay, etc AFAIK). So I checked my code and Paul Davis did too, and google will reveal you that most docs/examples on the net are still using the old prototype (those include the howto you cite yourself !!). What's more, those pieces of code are skeletons, just to show what the API looks like, and not real practical examples (else I wouldn't have put those (...) everywhere). There are API references and tutorials to give more up-to-date and exhaustive explanations if you wish to write a real program. However, I'll consider your comments and I'd like to thank you for your interest in my article, hoping it helped you a bit.

Compilation and usage would be nice.

Posted by: Administrator on August 12, 2004 11:54 AM

I read the article and the Paul Davi's. For someone like me, trying to test programming for the sound device for the first time, both articles need to include instructions on how to compile and use the sample programs. I managed to compile the playback code example, but I get an error message when I use it.

<TT>
ALSA lib pcm.c:1972:(snd_pcm_open_noupdate) Unknown PCM /dev/snd/pcmC0D6c
cannot open audio device /dev/snd/pcmC0D6c (No such file or directory)
</TT>

I tried all the devices in /dev/snd/, /dev/dsp, etc. with no luck. Finally, after finding this tutorial <A HREF="http://www.suse.de/~mana/alsa090_howto.html" title="www.suse.de">http://www.suse.de/~mana/alsa090_howto.html</a www.suse.de>, I learned that the device convention is "plughw:0,0". This is a totally different API than the usual /dev/xxxxxx convention.

I also spotted an error with the sample code, in the function snd_pcm_hw_params_set_rate_near, the third and fourth arguments must be pointers. Since man pages seem not to exist, I had to google the function in order to find it's syntax (That doxygen crap is too hard to navigate, man function_name is simple and does not require a browser.)

To make the ALSA playback code work (and not produce a segmentation fault), you'll need to add/modify:

<TT>int dir=0;

int rate=44100;

snd_pcm_hw_params_set_rate_near (playback_handle, hw_params, &rate, &dir)</TT>

The program needs to execute with the argument "plughw:0,0", you'll get a click.

As introductory material, this and the referenced ALSA tutorial are really poor articles. Both need the code checked to make sure it actually works.

An introduction to Linux sound systems and APIs

Posted by: Anonymous [ip: 127.0.0.1] on September 25, 2007 05:04 AM

we are doing a project on sound recognition.........we are finding a problem of how to access waveform format from sound card.......Is there an API to access waveform from soundcard.......please send solution to this......my email is sunil.pes@gmail.com

Linux.com

What is Linux?

Feeds

Special Offers

Feature

An introduction to Linux sound systems and APIs

Related Links

Comments

dmix plugin for ALSA allows concurrent access

Re:dmix plugin for ALSA allows concurrent access

Re:dmix plugin for ALSA allows concurrent access

OSS' API is not dead

api

Compilation and usage would be nice.

An introduction to Linux sound systems and APIs

This story has been archived. Comments can no longer be posted.