GCC for Embedded Engineers
The collection of programs necessary to compile and link an application is called the toolchain, and GCC, the compiler, is only one part. A complete toolchain consists of three separate parts: binutils, language-specific standard libraries and the compiler. Notably absent is the debugger, which is frequently supplied with the toolchain but is not a necessary component.
binutils (binary utilities), performs the grunt work of manipulating files in a way that's appropriate for the target machine. Key parts of the toolchain, such as the linker and assembler, reside in the binutils Project and aren't part of the GCC Project.
Hidden inside the binutils Project is another nifty bit of software, the BFD library, which technically is a separate project. The BFD, Binary Descriptor Library (the actual acronym unpacks to something too bawdy for this publication), provides an abstract, consistent interface to object files, such as handling details like address relocation, symbol translation and byte order. Because of the features supplied by BFD, most tools that need to read or manipulate binaries for target reside in the binutils Project to best take advantage of what BFD has to offer.
For the record, binutils contains the following programs:
addr2line: given a binary with debugging information and an address, returns the line and file of that address.
ar: a program for creating code archives that are a collection of object files.
c++filt: demangles symbols. With classes and overloading, the linker can't depend on the underlying language to provide unique symbol names. c++filt will turn _ZN5pointC1ERKS_ into something readable. A godsend when debugging.
gprof: produces reports based on data collected when running code with profiling enabled.
nlmconv: converts an object file into a Netware Loadable Module (NLM). If you've ever worked with NLMs, you probably did so with your collar turned up and cringed when seeing ABEND on your terminal. It's noted here because nlmconv is rarely, if ever, distributed with a toolchain.
nm: given an object file, lists symbols such as those in the public section.
objcopy: translates a file from one format to another, used in the embedded file to generate S-Records from ELF binaries.
objdump/readelf: reads and prints out information from a binary file. readelf performs the same function; however, it can work only with ELF-formatted files.
ranlib: a complement to ar. Generates an index of the public symbols in an archive to speed link time. Users can get the same effect by using ar -s.
size: prints out the size of various components of a binary file.
strings: extracts the strings from a binary, performing correct target host byte order translation. It's frequently used as the slacker's way of seeing what libraries a binary links to, as ldd doesn't work for cross-compiled programs: strings <binary> | grep lib.
strip: removes symbols or sections, typically debugging information, from files.
The C language specification contains only 32 keywords, give or take a few, depending on the compiler's implementation of the language. Like C, most languages have the concept of a standard library supplying common operations, such as string manipulation, and an interface to the filesystem and memory. The majority of the programming that happens in C involves interacting with the C library. As a result, much of the code in the project isn't written by the engineers, but rather is supplied by the standard libraries. Picking a standard library that has been designed to be small can have a drastic impact on the final size of the project.
Most embedded engineers opt for using a C library other than the standard GNU C Library, otherwise known as glibc, to conserve resources. glibc was designed for portability and compatibility, and as such, it contains code for cases not encountered or that can be sacrificed on an embedded system. One example is the lack of binary compatibility between releases of the library. Although glibc rarely breaks an interface once published, embedded standard libraries do so without any qualms.
Table 1 outlines the most frequently used C libraries, with the pros and cons of each.
Table 1. Pros and Cons of Most Frequently Used C Libraries
|glibc||The canonical C library; contains the greatest amount of support for all C features; very portable; support for the widest number of architectures.||Size; configurability; can be hard to cross-build.|
|uClibc||Small (but not the smallest); very configurable; widely used; active development team and community.||Not well supported on all architectures; handles only UTF-8 multibyte characters.|
|DietLibC||Small, small, small; excellent support for ARM and MIPS.||Least functionality; no dynamic linking; documentation.|
|NewLib||Well supported by Red Hat; best support for math functions; great documentation.||Smallish community; not updated frequently.|