From Core to Kit

by Mike Esch

Over the past two years the winds of change have blown into the embedded systems arena. Interest in embedded Linux has skyrocketed, and in fact, products powered by embedded Linux are now commercially deployed.

There also has been a growth in the number of companies providing embedded Linux products and services. Almost all companies in this space are still wrestling with the dilemma of how to augment the Open Source philosophy while maintaining black ink at the bottom of the financial statements.

Up until January 2000, we had no official corporate Linux strategy. Our corporate history is in embedded system design, centered on data communication products. However, the lure of embedded Linux proved too overwhelming, and we created a new division whose mantra was to focus on the provision of engineering services using embedded Linux.

The evolutionary path of chips for embedded usage also intrigued us. Integration of more and more functionality into a single chip has led to the wave of system-on-chips (SOCs). Companies are extending this trend even further by providing configuration flexibility within the SOC. The release of Altera's Nios soft-core CPU in the summer of 2000 caught our attention.

The Nios processor was the industry's first general-purpose, RISC-based, embedded processor core optimized specifically for programmable logic. Altera had also partnered with Red Hat to provide Nios extensions to the GNUPro Embedded System Development Tools. The Excalibur Development Kit bundled together the necessary hardware and software to prototype system-on-programmable-chip (SOPC) designs.

The release of Excalibur and the Nios coincided perfectly with our formation of an embedded Linux group. We were looking for a Linux-based project that would allow us to dive into the embedded Linux sphere quickly and analyze the technical merits to complement our business assessment of the embedded Linux market.

Initially, our intent simply was to port Linux to the Nios processor. However, after further discussions, the concept of a full-fledged Linux Development Kit (LDK) emerged. This concept was honed out by theorizing the target audience to which an LDK might appeal. The profile of an LDK user is anticipated to be someone wanting to prototype an application rapidly, with the eventual intent of moving to a masked programmable logic device (PLD) or application-specific integrated circuit (ASIC) product solution. Based on this assumption, we wanted an environment that worked almost out of the box. In other words, we didn't want users to have to comb the Internet for source code and patches before they could even bring up a shell, let alone begin prototyping their applications.

The remainder of this article describes not only our porting experiences of the Linux kernel to a brand new CPU architecture but also the trials and tribulations of producing a full-blown development kit.

The Kernel

As the Nios processor is implemented in logic, it is extremely configurable. Switching from a 16- to a 32-bit version is simply a configuration option. Likewise, should you wish three serial ports instead of two, you add another one and regenerate the CPU core. This high level of flexibility allows you to maximize your usage of the PLD by not having unwanted peripherals consuming valuable logic cells.

The Nios is a RISC processor, (very loosely) resembling the SPARC architecture. It has a large windowed register file (configurable of course) for fast context switching. This register file can be a maximum of 512 registers deep. The Nios processor also includes a fairly comprehensive instruction set with both full and partial width register-indirect addressing (with and without offset).

Porting Linux to a virgin CPU requires that you have an intimate knowledge of the CPU architecture. Fortunately, the initial formidable task of extending gcc, et. al., for a new CPU had already been done for Altera by the good folks at Red Hat. However, as we quickly discovered, because the Nios was so new these extensions had no real code base against which they could be tested. The Linux port served to shake out a few minor issues with the toolchain, but this was an unnerving experience for our developers who were accustomed to extremely mature x86 development suites. The thought of even suspecting that the compiler generated incorrect code was foreign to us.

One of the most fascinating concepts about open-source software is its viral growth. As a grad student I have memories of Linux kernel version .11 running off a floppy disk on a 386 processor. When it came time to select a kernel base for the Nios processor, I marveled at the incredible mutation of the kernel source tree and the amount of software effort contributed by so many programmers.

Our choice of kernel as a starting point for the port was influenced greatly by the fact that the Nios processor does not have a memory management unit (MMU). An MMU would consume a fair amount of CPLD (complex programmable logic device) resources, so it was decoupled from the Nios processor. If required, an MMU could be implemented as an IP (intellectual property) block. Rt-Control (now Lineo) took a 2.0.38 kernel tree, stripped out the memory-management scheme designed for an MMU, replaced it with a non-MMU scheme and optimized the kernel for embedded applications. The result of this project was µClinux, a new fork of Linux for MMU-less processors. Distributions exist for the Motorola DragonBall and ColdFire families, the ARM7TDMI and the Intel i960 to name but a few.

Since the Nios processor loosely resembles the SPARC, and because the Motorola distribution seemed the most stable at the time, we crossed the two distributions to form our starting point. Perhaps technology will evolve to the point where the endian-ness of a processor will be irrelevant. Until such time, embedded developers will continue to be plagued by big vs. little endian quirks. The port to the Nios processor was no exception. The Nios processor is a little-endian CPU; the Motorola is big-endian. The initial filesystem type we supported was ROMFS, which happens to be described in big-endian format. The module blkmem.c did not check for endian-ness, presumably because the target CPU it was originally written for matched the endian-ness of the filesystem.

In general, we found the Linux kernel source tree to be extremely modular, with a well-defined delineation between hardware-specific and nonhardware-specific code. Figure 1 shows a µClinux kernel source tree (based on a 2.0.38 kernel). The directories highlighted in blue indicate the additions to the base kernel for µClinux (as of December 2000), and the directories highlighted in yellow indicate our additions to the source tree for the Nios port. There is very little contamination of the general Linux code when a new architecture is added. When debugging kernel problems, our rule of thumb was to examine our Nios architecture-specific code first, followed by µClinux-specific modifications and finally general Linux kernel code.

One of the fundamental concepts in Linux is that of user vs. kernel mode. User tasks access kernel services through system calls. When an interrupt occurs, the kernel needs to know which mode it was operating in prior to the interrupt in order to select the correct stack. In order to enforce the paradigm of CPU modes, most processors rely on specific hardware features. For example, the Intel 386 architecture has four unique CPU privilege levels. The Nios processor has no such equivalent mechanism so our solution was to implement a ``soft'' privilege bit that then became an extension to the hardware register status, i.e., when the hardware status register was saved or restored, so was the soft privilege bit.

The handling of register window underflow/overflow traps was another section of the kernel that presented a significant design challenge. Having implemented our soft privilege bit (see above) we could track and use the correct stacks. The complications arose around interrupt service routine (ISR) processing. Linux ISRs routinely lower the interrupt priority such that higher priority interrupts are re-enabled. Also, most ISRs are written in C and are called from an assembler stub. These routines in turn call other C routines using the register window. If an interrupt arrives in the last available register window, a new register window needs to be opened up and the old one saved in the correct stacks. Care must be taken to ensure that when the interrupt unwinds and is about to return, the register window is refilled from the correct stacks.

Assuming the reader was not totally confused by the above paragraph, to further complicate matters the underflow/overflow trap is maskable. In other words, if interrupts are disabled and a register window underflow/overflow occurs the kernel does not receive notification of this event and is unable to restore/save the appropriate registers. Our workaround to this solution is very similar to that of the SPARC. The overflow/underflow trap is the highest priority interrupt on the Nios processor, so we simply raise the interrupt priority level in Linux to that level, thereby disabling all but the overflow/underflow trap. There are two drawbacks to this method. First, in terms of CPU instructions it is much less costly simply to disable interrupts. For performance reasons, we inserted some judicious uses of the interrupt enable bit around atomic operations deemed safe from potential overflow/underflow traps. Secondly, we have no protection over third-party code that might decide to disable interrupts directly, rather than going the other route.

The Nios processor does not support unaligned accesses to memory, but great care seems to have been taken in the kernel to align C structures on 32-bit boundaries. Protocol stacks (e.g., TCP/IP) do not always preserve this alignment on network data. Adding a compiler flag (--funaligned-pointers) will remedy these problems but at the cost of greatly increasing the number of instructions (essentially, all reads are treated as byte reads). An alternate solution would be to have the processor issue a software trap when it detects that an unaligned access has occurred. Our interim solution was to modify the start-of-buffer pointer in the Ethernet driver such that the IP and TCP headers were aligned on a 32-bit boundary (since the Ethernet header information is read as a byte stream). The optimal solution is that any network (or filesystem) modules whose structures are not memory-aligned be recoded to use macros/defines on any read/write operation. The macro/define would account for the processor's ability (or lack thereof) to deal with unaligned data and compensate accordingly. Conceptually, this process is similar to the use of ntohl() to work around endian-ness problems.

Libraries and Applications

An application suite extends the usefulness of a kernel port immensely. Once user-space programs are supported, kernel verification can proceed to the next level. Applications quickly illuminate any problems with context switching and the system-call interface. The significance of a user application delivering ``hello, world'' to your console for the first time is sometimes underrated. For this event to occur the kernel must run through all its initialization code, mount some sort of filesystem, load an application, transfer execution to said application and then switch from user to kernel space via system calls to print out the console message.

Expanding the above chain of events, the first hurdle to overcome (after the kernel initialization) was the mounting of the filesystem. As the distribution from Lineo used a ROMFS, we adopted this method as well. To build a ROMFS, we created the desired directory structure on a host machine and populated it with applications for the Nios processor. A host program, genromfs, was executed that examined this structure and produced a single image file containing the ROMFS. We massaged this image into an object file that was then linked in with the kernel. We later separated the filesystem from the kernel to allow developers to reload a filesystem without having to reload the kernel as well.

Application programs generally make use of library calls to perform standard operations. There are a plethora of libraries on a standard Linux desktop distribution; for the Nios application suite we decided to use a standard C library customized for embedded use. Clibc, now maintained by Lineo, is tailored for embedded applications and includes support for non-MMU processors. Again, similar to the kernel, the source tree was fairly modular, and adding support for the Nios processor only required the creation of one subdirectory. Inside this directory reside files, such as crt0.S and setjmp.S, which are usually written in assembler. For example, crt0.S provides the transitional point for the kernel to transfer control to an application (and vice versa when the application exits).

Launching an application requires the kernel to pluck the binary file out of the ROMFS, load it into RAM, set up the task structure with pointers to the program sections (.bss, .text, .data, .stack) and transfer execution (via crt0.s) to the application. We chose to use the flat-file format, which is a binary format introduced by µClinux. To create a flat binary, a developer produces an ELF executable on their workstation and runs a conversion tool (elf2flt) to convert the executable to a flat format. The flat file consists of a simple header that points to the various sections in the image, the sections themselves and a series of relocation records.

Initially, we bound the applications to fixed addresses, as we were primarily interested in testing the kernel-user interface. Relocation of applications became a necessity as more of the multiprocessing features of Linux were introduced (e.g., multiple shells running concurrently).

For the Motorola m68k distribution of µClinux, the flat-file loader (binfmt_flat.c) is relatively simple. There is one 32-bit relocation record per relocation. The top two bits flag the image section, and the remaining bits are an offset into that section. The loader simply adds the base address of the appropriate 32-bit value at the given offset. Nios relocations are considerably more complex, requiring an extension to the existing format of three 32-bit words. Our version of flat files includes relocation information for every address reference because the Nios processor does not have any mapping registers.

Most of the open-source applications we provide with the LDK required little or no source code changes during the porting process. The build procedures were heavily tuned to reflect the Nios/GNU environment. When crafting applications to run on a µClinux kernel, developers do need to keep in mind that the fork() system call is not implemented. Instead, µClinux provides a vfork() system call that closely mimics the functionality of fork(). The main difference between fork() and vfork() is that the child process shares the same address space as the calling process. Execution of the parent process blocks until the child process exits (or transfers control by issuing an exec type call). Most applications easily can be adapted to accommodate this shortcoming.

Included with the LDK is a pre-built filesystem, resplendent with open-source applications. We expect the network applications to be particularly useful to developers who are prototyping solutions. Being able to both NFS- and SMB-mount remote filesystems means that applications can be run from the host development workstation.

The Cygwin Environment

The GNUPro tools provided by Altera are actually for the Cygwin environment. The Cygwin library provides a UNIX-like API on top of the Win32 API. In essence, Cygwin gives you a very realistic emulation of UNIX on your Windows box. Because Altera's tools run best in a Windows environment, one of the main requirements for the LDK was that it could be completely self-sufficient on a Windows workstation.

As a side note, some of our developers prefer to develop under Linux. Recompiling the Nios GNUPro tools under Linux was the only task required to fulfill this desire.

It is to be expected that an emulation of UNIX on a fundamentally different operating system such as Windows will present some quirks. Much like endian-ness of a processor, termination of text lines is treated differently by various operating systems. Shell scripts that use the Cygwin bash shell require linefeeds and linefeeds only. The patch utility is another candidate that is not amused by a CR/LF combination. In a perfect world, all CVS source files would be linefeed-terminated only. We extracted several files from different trees that did not obey this edict. Under Cygwin, it is possible to mount the native filesystem in either binary or text mode; we found the simplest solution was to traverse all pertinent files and ensure they were linefeed-terminated only.

Another nuance to be aware of is Cygwin's handling of filenames. In Cygwin, unlike Linux, filenames are case insensitive. For example, a file named foo.bar will appear only once in a Cygwin directory regardless of how it is named (in Linux there could be 26 different files, depending on the case of each letter). Thus, adding the gcc flag save-temps has the effect, under Cygwin, of placing .s intermediate files in the same directory as source files, thereby replacing user-written .S source files.

Some of the utility programs under Cygwin had to be tailored for that environment. Cygwin does not exactly mimic mmap() as it is implemented under Linux. As mmap() is a rather complicated procedure, elaboration of this problem is beyond the scope of this article. This problem manifested itself in mkdep.c but only when source file sizes fell on a page boundary.

The utility for generating the ROMFS, genromfs, also had to be tweaked to function under Cygwin. The chmod utility under Cygwin has no bearing on the executable attribute of files. If files are created with an .exe extension, Cygwin sets the executable bit. Our solution was to place all our applications in the ROMFS staging area with .exe extensions and then modify genromfs to strip out the .exe as it was creating the ROMFS. Cygwin also doesn't allow mknod operations; we added the solution originally developed to circumvent nonroot access under Linux, namely a special filename syntax (@dev) for which genromfs could scan.

Summary

As to what's next for the LDK, we found we had to put a box around the first release of the kit to eliminate the scope creep and actually get it in the hands of users. We will let customer feedback dictate the next steps for the LDK. Certainly, upgrading to the 2.4 kernel series would seem like the next logical step. We envision many more hardware boards being added to the kit (e.g., USB) to augment the functionality. Loadable module support is also high on the priority list. To further enhance application development, we may port a version of gdb server to the Nios processor. This utility would communicate with a client, through a network, on a host-development machine. A few brain cells are usually expended trying to visualize Telneting into the Nios, SMB mounting your host filesystem, starting gdb server on the Nios and gdb client on the host and executing your application from the host machine.

We found this to be a very interesting project. We discovered there is an incredible amount of functionality contained in an open-source release, but we also found that open source is certainly not plug and play. A certain amount of effort on the part of the user is usually required to unleash the usefulness of the code. That's what our LDK is all about--trying to cut down on this effort so users can concentrate on the task at hand, namely developing/prototyping solutions using the Nios processor. Using our prepackaged CPU core, kernel and filesystem, we hope a user could have a simple application running in less than two hours.

Mike Esch (mike@microtronix.com) is the S/W development manager at Microtronix and has been involved with Linux since 1992. He holds a BMath from the University of Waterloo and an MSc from the University of Western Ontario.

Load Disqus comments