Playing with Binary Formats

This article explains how kernel modules can add new binary formats to a system and show a pair of examples.
A Sample Implementation: Displaying Images

To turn the theory into sound practice, let's try to expand our bluff into bloom (Binary Loader for Outrageously Ostentatious Modules). The complete source of the new module is distributed together with bluff.

The role of bloom is to display executable images. Give execution permission to your GIF images and load the module, then call your image like it was a command, and xv will display it.

This code is neither particularly original (most of it comes from binfmt_script.c) nor particularly smart (text-only people like me would rather use an ASCII viewer, for instance, and other people prefer a different viewer). I feel this kind of example is quite didactic anyway, and it can be easily run by anyone who can run an X server and has root access to the computer in order to load modules.

The source file is made up of little more than 50 lines and is able to execute GIF, TIFF and the various PBM formats; needless to say, you must give your images execute permissions (chmod +x) in advance. The viewer is configurable at load time and defaults to /usr/X11R6/bin/xv. Here is a sample session copied from my text console:

# insmod bloom.o
# ./snowy.tif
xv: Can't open display
# rmmod bloom
# insmod bloom.o viewer="/bin/cat"
# ./snowy.tif | wc -c
1067564

If you use the default viewer and work within a graphic session, your image file will bloom on the display.

If you can't wait to download the source file, you can see the interesting part of bloom in Listing 1. Note that bloom.c falls under the GPL, because most of its code is copied from binfmt_script.c.

Listing 1. The Core of bloom.c

Registering the Format with kerneld

The next question that I hear you ask is “How can I set up things so that kerneld can automatically load my module?”

Well, actually it isn't always possible. The code in fs/exec.c only tries to use kerneld when at least one of the first four bytes is not printable. This behaviour is meant to avoid losing too much time with kerneld when the file being executed is a text file without the #! line. While real binary formats have one non-printable byte in the first four, this isn't always true for generic data types.

The net result of this behaviour is that you can't automatically load the bloom viewer when invoking a GIF file or when calling a PBM file by name. Both formats begin with a text string and will therefore be ignored by the auto-loader.

When, on the other hand, the file has a non-printing character within the first four, the kernel issues a kerneld request for binfmt-number, where the exact string is generated by this statement:

sprintf(modname, "binfmt-%hd",
        *(short*)(&bprm->buf));

The ID of the binary format generated by the above statement represents the first two bytes of the disk file. If you try to execute TIFF files, kerneld looks for binfmt-19789 or binfmt-18761. A gzipped file calls for binfmt--29921 (negative). GIF files, on the other hand, are passed to /bin/sh shell due to their leading text string. If you want to know the number associated with each binary format, look in the /usr/lib/magic file and convert the values to decimal. Alternatively, you can pass the debug argument to kerneld and look at its messages when you execute your data files and it tries to load the corresponding binary format.

It's interesting to note that kernel versions 2.1.23 and newer switched to an easier and more significant ID by using the following line:

sprintf(modname, "binfmt-%04x",
              *(unsigned short *)(&bprm->buf[2]));

This new ID string represents the third and fourth byte of the binary file and is hexadecimal instead of decimal (thus leading to strings with a better format and no ugly “minus-minus” appearing now and then.

What's This Worth?

While calling images by name can be funny, it has no real role in a computer system. I personally prefer calling my viewer by name, and I do not believe in the object-orientedness of the approach. This kind of feature in my opinion is best suited to the file manager where it can be tailored by appropriate configuration files without introducing kernel bloat to lie in the way of any computational path.

What is really interesting about binary formats is the ability to run program files that don't fall in the handy #! notation. This includes executable files belonging to other operating systems or platforms, as well as interpreted languages that have not been designed for the Unix operating system—all those languages that complain about a #! in the first line.

If you want to play one such game, you can try the fail module. This “Format for Automatically Interpreting Lisp” is a wrapper to invoke Emacs any time a byte-compiled e-lisp program is invoked by name. Such practice is definitely failure-prone, as it makes little sense to invoke several megabytes of program code to run a few lines of lisp. Moreover, Emacs-lisp is not suited to command-line handling. Together with fail you'll also find a pair of sample lisp executables to make your tests.

A real-world Linux system is full of interesting examples of interpreted binary formats such as the Java binary format. Other examples are the binary format that allows the Alpha platform to run Linux-x86 binaries and the one included in recent DOSEMU distributions that is able to run old DOS programs transparently (although the program must be specifically tailored in advance).

Version 2.1.43 of the kernel and newer ones include generic support for interpreted binary formats. binfmt_misc is somewhat like bloom but much more powerful. You can add new interpreted binary formats to the module by writing the relevant information to the file /proc/sys/fs/binfmt_misc.

Listing 1 and all other programs referred to in this article are available by anonymous download in the file ftp.linuxjournal.com/pub/lj/listings/issue45/2568.tgz.

Alessandro Rubini used to read e-mail at his university account, but then abandoned academia because he was forced to write articles. He now reads e-mail as rubini@linux.it and still writes articles.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix