Playing with Binary Formats

This article explains how kernel modules can add new binary formats to a system and show a pair of examples.

One of the roles that kernel modules can accomplish is adding new binary formats to a running system. A “binary format” is basically a data structure responsible for executing program files—the ones marked with execute permission. The code I'm going to introduce runs with version 2.0 of the kernel.

Kernel modules are meant to add new capabilities to a Linux system, device drivers being the best known such “capabilities”. As a matter of fact, the highly modular design of the Linux kernel allows run-time insertion of many features other than device drivers—we saw a few months ago how /proc files and sysctl entry points can be created by modularized code.

One other such loadable feature is the ability to execute a binary format; this includes both executable files and shared libraries. While the mechanism of loading compiled program files and shared libraries is quite elaborate, the average Linux user can easily add loaders that invoke an interpreter for new binary formats. Thus, the user is able to call data files by name and have them “executed”, after invoking chmod +x on the file.

How a File Gets Executed

Let's start this discussion by looking at how the exec system call is implemented in Linux. This is an interesting part of the kernel, as the ability to execute programs is at the basis of system operations.

The entry point of exec lives in the architecture-dependent tree of the source files, but all the interesing code is part of fs/exec.c (all pathnames here refer to /usr/src/linux/ or the location of your sources). To check architecture-specific details, locate the function by typing the command:

arch/*/kernel/*.c

Within fs/exec.c the toplevel function, do_execve(), is less than fifty lines of code in length. Its role is checking for errors, filling the “binary parameter” structure (struct linux_binprm) and looking for a binary handler. The last step is performed by search_binary_handler(), another function in the same file. The magic of do_execve() is contained in this last function which is very short. Its job consists of scanning a list of registered binary formats, passing the binprm structure to all of them until one succeeds. If no handler is able to deal with the executable file, the kernel tries to load a new handler via kerneld and scans the list once again. If no binary format is able to run the executable file, the system call returns the ENOEXEC error code (“Exec format error”).

The main problem with this kind of implementation is in keeping Linux compatible with the standard Unix behaviour. That is, any executable text file that begins with #! must be executed by the interpreter it asks for, and any other executable text is run by /bin/sh. The former issue is easily dealt with by a binary format specialized in running interpreter files (fs/binfmt_script.c), and the interpreter itself is run by calling search_binary_handler() once again. This function is designed to be reentrant, and binfmt_script checks against double invocation. The latter issue is mainly an historical relic and is simply ignored by the kernel. The program trying to execute the file takes care of it. Such a program is usually your shell or make. It's interesting to note that while recent versions of gmake execute properly when a script has no leading #! line, previous versions didn't call a shell resulting in a “cannot execute binary file” message when running unadorned scripts from within a Makefile.

All the magic handling of data structures needed to replace the old executable image with the new one is performed by the specific binary loader, based on utility functions exported by the kernel. If you would like to take a look at such code, the function load_out_binary() in fs/binfmt_aout.c is a good place to start—easier than the ELF loader.

Registration of Binary Formats

The implementation of exec is interesting code, but Linux has more to offer: registration of new binary formats at run time. The implementation is quite straightforward, although it involves mucking with rather elaborate data structures—either the code or the data structures must accomodate the underlying complexities; elaborate data structures offer more flexibility than elaborate code.

The core of a binary format is represented in the kernel by a structure called struct<\!s>linux_binfmt, which is declared in the linux/binfmts.h file as follows:

struct linux_binfmt {
        struct linux_binfmt *next;
        long *use_count;
        int (*load_binary)(struct linux_binprm *,
                struct pt_regs *);
        int (*load_shlib)(int fd);
        int (*core_dump)(long signr,
                struct pt_regs *);
        };

The three functions, or “methods”, declared by the binary format are used to execute a program file, to load a shared library and to create a core file. The next pointer is used by search_binary_handler(), while the use_count pointer keeps track of the usage count of modules. Whenever a process p is executing in the realm of a modularized binary format, the kernel keeps track of *(p->binfmt->use_count) to prevent unexpected removal of the module.

A module, then, uses the following functions to load and unload itself:

extern int register_binfmt(struct linux_binfmt *);
extern int unregister_binfmt(struct linux_binfmt *);

The functions receive a single argument instead of the usual pointer,name pair because no file in the /proc directory lists the available binary formats. The typical code for loading and unloading a binary format, therefore, is as simple as the following:

int init_module (void) {
  return register_binfmt(&bluff_format);
}
void cleanup_module(void) {
  unregister_binfmt(&bluff_format);
}
The previous lines belong to the bluff module (Binary Loader for an Ultimately Fallacious Format), whose source is available for public download from ftp://ftp.linuxjournal.com/pub/lj/listings/issue45/2568.tgz.

The structure representing the binary format can declare as NULL any of the functions it offers; NULL functions are simply ignored by the kernel. The easiest binary format, therefore, looks like the following, which is the one used by the bluff module:

struct linux_binfmt bluff_format = {
        NULL, &mod_use_count_, /* next, count */
        NULL, NULL, NULL    /* bin, lib, core */
};

Yes, bluff is a bluff; you can load and unload it at will, but it does absolutely nothing.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState