uClinux as an Embedded OS on a DSP

by Michael Hennerich

Recently, several large consumer electronics companies announced a collaboration called the Consumer Electronics Linux Forum (CELF) to develop further the Linux platform for use in digital home electronic devices. The founders of CELF--Matsushita Electric, Sony, Hitachi, NEC, Royal Philips Electronics, Samsung, Sharp and Toshiba--are focused on the advancement of Linux as an open-source platform for consumer electronics devices. As such, they actively are supporting and promoting the spirit of the Open Source community.

The advantage of embedded Linux is it is a royalty free, open-source, compact solution that provides a strong foundation on which an ever-growing base of applications can run. Linux is a fully functional operating system, with support for a variety of network and file-handling protocols--an important requirement in embedded systems because of the need to compute anywhere, anytime. Modular in nature, Linux is easy to slim down by removing utility programs, tools and other system services that are not needed in an embedded environment. The advantages for companies using Linux in embedded markets are faster time to market and reliability. For those developers, the combination of a digital signal processor (DSP) and uClinux may be of particular interest.

What's the Difference Between Linux and uClinux?

Because Linux is similar to UNIX in that it is a multi-user, multitasking OS, the kernel has to take special precautions to assure the proper and safe operation of up to thousands of simultaneous processes from different users on the same system. The UNIX security model, after which Linux is designed, protects every process in its own environment with its own address space. Every process also is protected from processes being invoked by different users. Additionally, a virtual memory (VM) system has additional requirements that modern CPUs have to fulfill, including dynamic allocation of memory and mapping arbitrary memory regions into the private process memory.

Some devices, such as Analog Devices' Blackfin Processor, do not provide a full-fledged memory management unit (MMU), because developers targeting their application to run without the use of an OS normally do not need an MMU. Additionally, MMU-less processors such as the Blackfin are more power efficient and often are significantly less expensive than the alternatives.

To support Linux on such devices, a few trade-offs have to be made:

  • No real memory protection (a faulty process can bring down the complete system)

  • No fork system call

  • Only simple memory allocation

  • Some other minor differences

Memory protection is not a real problem for most embedded devices. Linux is a stable platform, particularly in embedded devices, where software crashes rarely are observed.

The second point is a little more problematic. In software written for UNIX or Linux, developers often use the fork system call when they want to do things in parallel. The fork call makes an exact copy of the original process and executes it simultaneously. To do that efficiently, it uses the MMU to map the memory from the parent process to the child and copies only those memory parts to that child it writes. Therefore, uClinux cannot provide the fork system call. It does, however, provide vfork, a special version of fork in which the parent is halted while the child executes. Therefore, software that uses fork system calls has to be rewritten to use either vfork or threads. uClinux supports threads because they share the same memory space, including the stack.

As for point number three, there usually is no problem with the malloc support uClinux provides, but minor modifications sometimes have to be made.

Most of the software available for Linux or UNIX can be compiled directly on uClinux. The rest usually require only some minor porting or tweaking. Only a few applications do not work on uClinux, with most of those being irrelevant for embedded applications anyway.

Developing on uClinux

When selecting development hardware, developers should choose carefully with regards to price and availability. They also should look for readily available open-source drivers and documentation.

A uClinux Blackfin Processor development environment consists of the GNU Compiler Collection (GCC cross compiler) and the binutils (linker, assembler and so on) for the Blackfin Processor. Additionally, some GNU tools such as awk, sed, make and bash, plus Tcl/Tk are needed, although they usually come as part of basic desktop Linux distributions.

After the development environment has been installed and uClinux distribution decompressed, development may start. First, the developer uses the graphical configuration utility to select an appropriate board support package (BSP) for his target hardware. Developers using their own hardware should make themselves comfortable with development on the EZ-KIT Lite or STAMP hardware (schematics and production files available here). After that, they can start writing their drivers and making a BSP by copying an existing one and modifying a few parameters.

Most of the development work consists of selecting the appropriate drivers and de-selecting kernel features not needed for the project in question. A selection of library features and user-space programs follows thereafter.

Figure 1. Graphical Kernel Configuration

The uClinux distribution provides a wide selection of utilities and programs designed with size and efficiency as their primary considerations. One example is BusyBox), a multi-call binary program that includes the functionality of a lot of smaller programs and acts like any one of them if it is called with the appropriate name. If BusyBox is linked to ls and contains the ls code, it acts like the ls command. The benefit of this is BusyBox saves some overhead for unique binaries and those small modules can share common code.

After everything is selected and successfully compiled, the Kernel and a RAM disc image can be loaded on to the target hardware with the help of the VisualDSP++. Once this is successful, further development can proceed. The next step is to use a serial or network-enabled bootloader instead of loading through the JTAG interface. A small circuit connected to a PC's parallel port and Blackfin's JTAG interface can be used to flash the bootloader initially to the target memory. But it is important to note that this workaround doesn't provide the debugging and emulation capabilities that VisualDSP++ does. Once the kernel is up and running, the free GNU Debugger (GDB) can be used to debug user applications.

The next step would be the development of the special applications for the target device or the porting of additional software. A lot of development can be done in shell scripts or with languages such as Perl or Python. Where C programming is mandatory, Linux's extensive support for protocols and device drivers provides a powerful environment for the development of new applications.

Figure 2 is an example of how easy an AC'97 audio codec could be wired to a Blackfin Processor, without the need for additional active hardware components.

Figure 2. AD1885 Wiring Diagram

Below is an example of a very simple program for reading from this codec, assuming an audio AC'97 driver is compiled into the kernel.

fd = open("/dev/dsp", O_RDONLY, 0); //open the audio device
int speed = 44100; // 44.1kHz
ioctl(fd, SNDCTL_DSP_SPEED, &speed)// set sample rate  
read(fd, buffer_rx, number_of_bytes); // read number_of_bytes into buffer
close(fd); // close device

Why Put Linux on Embedded Hardware?

Despite the fact that Linux originally was not designed for use in embedded systems, it has found its way to a lot of embedded devices. Since the release of kernel version 2.0.x and the appearance of commercial support for Linux on embedded processors, there has been a real explosion of new embedded devices that feature the OS. Almost every day there seems to be a new device or gadget that uses Linux as its operating system, in most cases going completely unnoticed by end users. Today a majority of the available broadband routers, firewalls, access points and even some DVD players utilize Linux (see LinuxDevices.org for examples).

Linux and uClinux offer a bevy of drivers for all sorts of hardware and protocols. Combine that with the fact that Linux doesn't have runtime royalties, and it quickly becomes clear why so many developers are using Linux for their devices.

But why would anyone use Linux on a DSP?

In the past, DSP's have been used in a lot of applications including sound cards, modems, telecommunication devices, medical devices and all sorts of military and other appliances that perform pure signal processing. Those DSP systems generally were designed specifically for those applications and had only basic capabilities so as to meet their tight cost and size constraints. As DSPs have become more powerful and flexible, thereby servicing the more advanced requirements of military, medical and communication users, they still have lacked the proper capabilities to run advanced operating systems. Traditional DSPs are powerful and flexible but can be rather expensive. They often are found clustered together on special signal-processing hardware where there is no need to have an operating system such as Linux running on the DSP itself. This generally is due to the fact that in those systems the DSP gets its data from some type of additional central processing unit. Therefore, only basic system software had to be written for such DSPs.

Accompanied by the quickly advancing multimedia convergence and proliferation of multimedia and communication enabled gadgets, there now is a big market for a new type of DSP. Currently, the most widely used design for servicing these markets is the combination of a general-purpose processor with a traditional DSP serving as a co-processor. In this scenario, the operating system runs on the host processor and the signal processing is done on the DSP. This type of dual-processor design is sub-optimal, though, due to inefficiencies incurred in cost, power and size.

There are a few solutions to accommodate the new market demands:

  • Use specially designed ASICs and FPGAs and make the large up-front investment required to do development from scratch or to use and modify some third-party IP.

  • Use special hardware often combined with a general purpose IP-Core on an SOC (system on a chip), for example, a DVD player, scanner or digital camera on a chip. These devices generally are limited to the function for which they originally are designed.

  • Use a combination of a traditional DSP with a general purpose IP-Core on an SOC device, where the operating system runs on the IP-Core and the signal processing can be offloaded to the embedded DSP. Such an approach has been taken in some wireless LAN chip sets.

  • Redesign the DSP to fit the demands of an advanced operating system while preserving the advanced DSP architecture. This approach was taken by the Blackfin designers, who designed a processor with advanced DSP features around the well-proven Harvard Architecture with a RISC-like instruction set. Such a device is no longer a simple DSP, but rather a powerful processor that meets the intensive demands of a wide range of communication and multimedia applications. Combined with the capabilities and power of Linux, the possibilities are endless.

Real-Time Capabilities of uClinux

Because Linux originally was developed for server and desktop usage, it does not have the hard real-time capabilities that most other operating systems of comparable complexity and size have. Nevertheless, Linux, and in particular uClinux, has excellent so-called soft real-time capabilities. This means that while Linux or uClinux cannot guarantee certain interrupt or scheduler latency, compared to other operating systems of similar complexity, they show favorable performance characteristics. If one needs a so-called hard real-time system that can guarantee scheduler or interrupt latency time, there are a few ways to achieve such a goal:

  • Use another operating system. Many RTOS systems are available that meet this requirement--VDK, Nucleus PLUS, ThreadX and uITRON, to name a few.

  • Provide the real-time capabilities in the form of an underlying minimal real-time kernel such as RT-Linux or RTAI. Both solutions use a small real-time kernel that runs Linux as a real-time task with a lower priority. Programs that need predictable real-time are designed to run on the real-time kernel and are coded specifically to do so. All other tasks and services run on top of the Linux kernel and can utilize everything that Linux provides. This approach can guarantee deterministic interrupt latency while preserving flexibility.

  • Change the Linux kernel to achieve hard real-time interrupt latencies. Bernhard Kuhn is developing a patch for the Linux kernel that could achieve that. In the future, this patch could be ported to the uClinux Blackfin tree.

In most cases, hard real-time is not needed, particularly for multimedia applications, where the time constraints are dictated by the abilities of the user to recognize glitches in audio and video. Those physically detectable constraints that have to be met normally fall in the area of tens of milliseconds, which is no big problem on fast chips such as the Blackfin. Stricter timing requirements can be achieved with a little tweaking or with some straightforward changes to the scheduler. In kernel 2.6.x, the new stable kernel release, those qualities have been improved with the introduction of the new O(1) scheduler and kernel preemption.

In most cases, when running popular audio/video MPEG algorithms, there is enough processing power left over for the scheduler to have enough time to handle the low number of processes that normally run on such devices. Therefore, there is no problem to utilize programming in various languages--Perl, Python, PHP--and run Web servers--SNMP, PPP or PPPoE, firewalls--while decoding video and audio. Therefore, there is no need to use hard real-time OSes that lack the advanced features that Linux can provide.

The Blackfin Processor for uClinux

The combination of a first-class DSP core with traditional microcontroller architecture on a single chip avoids the restrictions, complexity and higher costs of traditional heterogeneous multiprocessor systems. Beneath the established peripheral equipment (SPI, UART with IrDa support, timer, RTC, Watchdog and event controller), all members of the Blackfin processor family provide two serial dual-channel ports (SPORTs), each supporting four stereo I2S channels with data rates up to 100 MBit/s. Furthermore, the newest members of the Blackfin Processor family (ADSP-BF531, ADSP-BF532, ADSP-BF533 and ADSP-BF561) provide a parallel peripheral interface (PPI) that provides connectivity for TFT flat panel displays and video converter (CCIR-656, 27MHz). The PPI also may be used as a parallel interface for AD/DA converters with speeds up to 65 MSPS.

All Blackfin processors combine a signal processing engine with the advantages of a clean, orthogonal RISC-like microprocessor instruction set and single-instruction, multiple-data (SIMD) multimedia capabilities into a single instruction set architecture. The micro signal architecture (MSA) core is a dual-MAC modified Harvard architecture designed to have unparalleled performance on both audio and video algorithms, as well as standard program flow and arbitrary bit manipulation operations mainly used by an OS.

The ADSP-BF531/BF532/BF533 processors have three large blocks of on-chip memory, providing high-bandwidth access to the core. These memory blocks are accessed at full processor core speed. The two memory blocks sitting next to the core, referred to as L1 memory, can be configured either as data or instruction SDRAM or cache. When configured as cache, the speed of executing external code from SDRAM is nearly on par with running the code from internal memory. This feature is well suited for running the uClinux kernel, which doesn't fit into internal memory. Also, when programming in C, the memory access optimization can be left up to the core by using cache.

Blackfin processors are designed using a low power and low voltage design methodology and feature dynamic power management. They meet the requirements of current mobile and battery powered applications. A Blackfin processor has multiple, highly flexible and independent DMA (direct memory access) controllers that support automated data transfers with minimal overhead impact on the processor core. DMA transfers can occur between the ADSP-BF531/BF532/BF533 processor's internal memories and any of its DMA-capable peripherals. Additionally, DMA transfers can be performed between any of the DMA-capable peripherals and external devices connected to the external memory interfaces, including the SDRAM controller and the asynchronous memory controller.


In combination with uClinux, an embedded processor offers tremendous advantages to designers, most notably by opening up a wide range of applications, drivers and protocols, which often are open-source or free software. In most cases, only a compilation or some minor tweaking is necessary to get that software up and running. Combine this with such invaluable tools as Perl, Python and PHP, and developers have the opportunity to develop even the most demanding feature-rich applications in a short time-frame, often with enough processing power left over for future improvements and new features. The embedded design community rapidly is adopting Linux as an operating system of choice. With the recent major update to the kernel, embedded Linux is proving to be a low-cost, dynamic development environment.

Dipl.-Ing.(FH), MSc Michael Hennerich, European DSP Applications Engineer (ADI Germany, Munich) studied electronic and computer engineering as well as computer-based engineering at Reutlingen University in Germany. Juergen Hennerich studies physics at the University of Tuebingen. He is a longtime member of the Linux User Group Tuebingen (LUGT). He has been using UNIX and Linux since the mid-1990s.

Load Disqus comments