Building a Minimal Glibc with Componentization

by Darren Hart

Glibc componentization is a process to
build a custom minimal set of the glibc C libraries, using only the
necessary objects required by a specific executable or group of
executables. By minimizing the footprint of the libraries,
resource-limited embedded targets can maximize resources available
for applications and storage. This article discusses the
feasibility of componentizing glibc as well as the development of
some custom analysis tools. With the help of these tools it was
possible to build test executables successfully, each with a custom
minimal version of libc.
Embedded systems typically have tighter resource constraints
than desktop computers or servers, although they often are expected
to perform similar functions such as serving web pages and storing
important information. Therefore, the applications they run use
much of the same functionality from the system libraries as their
desktop and server counterparts. With a reduced expectation of
expandability, it is logical to provide a minimal subset of the
same libraries.Independent embedded versions of the system libraries do
exist. While these libraries greatly reduce the footprint, they
sacrifice functionality (such as pthreads), do not guarantee
complete API compatibility with a complete glibc and must be
maintained separately.There are several advantages to building a minimal library
from the source of the complete library. The primary advantage is a
guaranteed equivalent API. Because there is only one source tree to
maintain, whenever glibc is updated so are all the minimal
libraries built from glibc. For example, developers don't have to
concern themselves with whether or not the embedded library's
printf function supports the %f parameter. This enables developers
to design applications on a desktop system, with all the amenities
they have to offer, and deploy them to an embedded target without
concerning themselves with API compatibility. The difficulty of
this approach involves how to create a minimal library from such a
large source tree without over-complicating the source code. This
study investigates the possibility of building a custom libc.so
from only the necessary prebuilt object files of a complete glibc
build.When glibc is linked as the final step of the build
processes, the various objects (1,756 total) satisfy undefined
symbols among themselves. Glibc contains nearly 250,000 implicit
dependencies among its various objects. With this many
dependencies, manually selecting which objects to include would be
tedious at best and impossible at worst. To make this task
manageable, a MySQL database containing all the object dependencies
for all of glibc was implemented. A detailed description of the
library analysis tool can be found in the Sidebar ``Library
Analysis Tool''. With this tool, a list of all the objects needed
to build a custom library can be generated based on the required
symbols of a given application set. From the output of this tool,
three test executables were successfully built, each with a custom
minimal version of glibc. These custom libraries are considerably
smaller than the complete versions, as small as 19% of the original
size for the simplest case.Library Analysis ToolBuilding GlibcThe first step was to build glibc, understand its build
process and note the size of each of its libraries. This analysis
was performed on a clean build of a recent version (2.1.3) with the
crypt and linuxthreads add-ons. The glibc library set consists of
21 libraries and the linker (ld.so); Table 1 lists all of them and
their respective sizes. It should be noted that glibc builds 21
libraries, and of these 21, libc is the largest, accounting for
nearly 50% of the total size. For this reason, this research is
focused on componentizing libc.so, with the reasoning that the
other 20 libraries are already sufficiently modular.Table 1. Original Libraries and
Sizes
By default, glibc builds three versions of its libraries:
static, shared and profiled. Only the process of building the
shared libraries is relevant to this study. This process consists
of five steps:

  1. All the object files (.os) are built with the -fPIC
    flag to gcc, creating position-independent code.
  2. For each directory, a listing of every object from
    that directory to be linked into libc is created in a stamp.os
    file.
  3. An archive, libc_pic.a, is created from these lists
    using ar.
  4. This archive is made relocatable with the -r flag
    to gcc.
  5. The relocatable archive is linked into a shared
    library, libc.so.

Preparing an ApplicationPrior to building a custom shared library, it is necessary to
determine which objects from libc.so will be needed for the target
application(s). This is done by compiling and linking the
application(s) to the newly built glibc, not the system glibc, and
then adding that application to the database managed by the
analysis tool. In order to avoid the need to install the newly
built glibc, the correct options must be passed at compile time to
link against the new library set.The sample application, test_printf.c, follows:

#include "stdio.h"
int main() {
    int i;
    for (i = 0; i < 10; i++) {
        printf("iteration: %02d\n", i);
    }
    return 0;
}

It is compiled with the commands shown in Listing 1. Note
that the system startup files and default libraries are omitted
with the -nostdlib and -nostartfiles options. They are replaced
with the startup files from the new glibc build (crt1.o, crti.o,
crtn.o, etc.), and the newly built libraries are explicitly
specified.
Listing 1. Compiling
test_printf.c
This application must be executed with the new loader as well
(or it will not find the right libraries). The command in Listing 2
specifies the new loader and library path and executes the
application. It can be verified that the appropriate libraries are
loaded by prepending strace to the previous command and examining
the output (the lines starting with open are of interest).Listing 2. Specifying the New Loader
and Library Path and Executing the Application
The program is then added to the database with the
addApplication.pl script:

./addApplication ../projects/testcases/test_printf

Building a Minimal libc.soA minimal libc.so can be built based on any set of
applications in the database. The following example will use a
single application (test_printf from above) as the source for
required objects. The process, outlined below, consists of the
following five steps:

  1. Generate a list of required object files,
    libc_objects.master.
  2. Generate a customized set of libc_objects
    files.
  3. Create an archive, libc_pic.a, from these lists
    using ar.
  4. Make the archive relocatable with the -r flag to
    gcc.
  5. Link the relocatable archive into a shared library,
    libc.so.

This process should be executed in the minilib directory,
containing only the Makefile and associated scripts. The Makefile
variable GLIBCPATH has to be updated to the path where glibc was
built; the rest of the process is automated with the
make command. The library analysis tool provides
a list of the object files that provides the symbols explicitly
required by an application, as well as the implicitly required
objects. This list, libc_objects.master, is generated by the
getAppDeps.pl script and should be copied to the minilib directory.
Running make first executes the script
getstamps, which descends into the glibc source directory and
recursively copies every stamps.os file to an equivalent tree
within the current directory. These stamps.os files are formatted
to list one object per line and are then sorted alphabetically. The
newly formatted stamp.os files are then joined with
libc_objects.master to create an intersection of the two files,
effectively removing any unnecessary objects from the list. The
full path is appended to the objects in the list, and the result is
stored in libc_objects (one per directory). With all the
libc_objects files in place, the custom library is ready to be
linked.The various commands needed to link the final shared library
were taken from the glibc make process and modified to account for
the new build location and object-list filenames (libc_objects).
Linking is done in three steps. First, ar is
used to link all the objects listed in the libc_objects files into
one archive with the command in Listing 3.Listing 3. Linking the Objects in
libc_objects into One Archive
Second, the archive is made relocatable:

gcc -nostdlib -nostartfiles -r -o libc_pic.os -Wl,-d -Wl,--whole-archive libc_pic.a

The -r option here generates relocatable code in the output
file, libc_pic.os; -nostdlib and -nostartfiles prevent gcc from
linking in the standard system libraries and startup files;
--whole-archive instructs gcc to include everything from the
archives listed after --whole-archive and before
--no-whole-archive, and not just the symbols explicitly required by
the other objects scheduled for link.
Finally, the shared library is created, as shown in Listing
4.Listing 4. The Shared
Library
The linker option, --version-script, acts as a filter for
exported symbols, providing complete control over which symbols are
exported. Even if a symbol exists in the objects and archives
linked into the library, they will not be exported by the final
shared library unless they are listed in the version-script,
libc.map. The -e option forces __libc_main as the library's entry
point. The -u option forces the symbol __register_frame to be
undefined, forcing a link with libgcc.a, which provides this
symbol. And then -rpath-link specifies the first set of directories
to search for share libraries specified on the command line, such
as ld.so. It should be noted that as these commands were taken from
the partially automatically generated commands from the glibc build
process, it is likely that there are some unnecessary paths and
even unnecessary options listed.The resulting library is placed in the top-level directory as
libc.so, a nonstripped shared library.When linking the application it is possible that the
libc_objects.master list is not complete, and undefined symbol
errors are the result. These symbols must be tracked down (using
the findsymbol script), and their providing objects should be
appended to the libc_objects.master list. Running make
clean
and make will attempt to rebuild
the shared library with the updated object list. In its current
state, the library analysis tool provides information assuming that
a custom version of every library will be built. Since only libc.so
is being rebuilt in this example, if the application requires
pthreads, the complete libpthread.so library will be used. If it
requires something of libc.so that the application does not, it
must be added manually. There are generally one or two objects that
must be added to the list. This manual step should be eliminated
with future versions of the analysis tool.Testing the Minimal LibraryTo test the custom library, the application for which it was
built must be relinked, using the new library. The new libc.so must
be copied into the glibc source tree, replacing the old one.
Running make again recompiles the test
application, linking to the new minimal library. This analysis
tested three test applications, each with unique requirements of
libc.so (see Table 2).Table 2. Test Cases and Minimal
Library Statistics
ConclusionGlibc componentization offers the most customizable
libraries, while requiring very little from the developer. The
advantages for componentization include rapid development, API
consistency and by using the stock glibc source tree, zero
maintenance due to a forked tree. Target devices that are resource
limited, but that will be used for varying tasks (such as PDAs),
should consider other options such as glibc profiling. A profiled
version of glibc could be built so that frequently accessed
functions are grouped together in pages. Devices not so restricted
as to resources may find the best solution simply is to use the
complete library. This approach allows for future development of
new and more functional applications, without the need to redeploy
the system libraries as well. Componentization finds its
application in very specialized devices where resources are at a
premium, and the applications it must run are fixed and known prior
to deployment.This process defines dependencies at the object level; it
does not offer as high a level of granularity as a system based on
symbols could, but it is relatively simple and in no way modifies
the glibc source tree. The library could be reduced further by
implementing simplified versions of some of the larger components,
but this too would require modifying the source code. The test
cases show that glibc can be componentized with reasonable
granularity at the object level, and although not as fine as at the
symbol level, this process is far easier and requires less effort
from all parties involved. The process discussed can be used to
implement any standards-compliant library proposed by third parties
as well as to create completely customized minimal libraries for a
specific application set when no standard is appropriate.GlossaryResources

Darren Hart is a
24-year-old senior in Brigham Young University's undergraduate
Computer Engineering program. His fields of interest and study
include embedded systems and embedded application development as
well as operating systems--Linux in particular. He has done three
consecutive co-ops with IBM, most recently with the Linux
Technology Center where he researched glibc
componentization.

Load Disqus comments

Firstwave Cloud