Playing with Binary Formats
To turn the theory into sound practice, let's try to expand our bluff into bloom (Binary Loader for Outrageously Ostentatious Modules). The complete source of the new module is distributed together with bluff.
The role of bloom is to display executable images. Give execution permission to your GIF images and load the module, then call your image like it was a command, and xv will display it.
This code is neither particularly original (most of it comes from binfmt_script.c) nor particularly smart (text-only people like me would rather use an ASCII viewer, for instance, and other people prefer a different viewer). I feel this kind of example is quite didactic anyway, and it can be easily run by anyone who can run an X server and has root access to the computer in order to load modules.
The source file is made up of little more than 50 lines and is able to execute GIF, TIFF and the various PBM formats; needless to say, you must give your images execute permissions (chmod +x) in advance. The viewer is configurable at load time and defaults to /usr/X11R6/bin/xv. Here is a sample session copied from my text console:
# insmod bloom.o # ./snowy.tif xv: Can't open display # rmmod bloom # insmod bloom.o viewer="/bin/cat" # ./snowy.tif | wc -c 1067564
If you use the default viewer and work within a graphic session, your image file will bloom on the display.
If you can't wait to download the source file, you can see the interesting part of bloom in Listing 1. Note that bloom.c falls under the GPL, because most of its code is copied from binfmt_script.c.
The next question that I hear you ask is “How can I set up things so that kerneld can automatically load my module?”
Well, actually it isn't always possible. The code in fs/exec.c only tries to use kerneld when at least one of the first four bytes is not printable. This behaviour is meant to avoid losing too much time with kerneld when the file being executed is a text file without the #! line. While real binary formats have one non-printable byte in the first four, this isn't always true for generic data types.
The net result of this behaviour is that you can't automatically load the bloom viewer when invoking a GIF file or when calling a PBM file by name. Both formats begin with a text string and will therefore be ignored by the auto-loader.
When, on the other hand, the file has a non-printing character within the first four, the kernel issues a kerneld request for binfmt-number, where the exact string is generated by this statement:
sprintf(modname, "binfmt-%hd", *(short*)(&bprm->buf));
The ID of the binary format generated by the above statement represents the first two bytes of the disk file. If you try to execute TIFF files, kerneld looks for binfmt-19789 or binfmt-18761. A gzipped file calls for binfmt--29921 (negative). GIF files, on the other hand, are passed to /bin/sh shell due to their leading text string. If you want to know the number associated with each binary format, look in the /usr/lib/magic file and convert the values to decimal. Alternatively, you can pass the debug argument to kerneld and look at its messages when you execute your data files and it tries to load the corresponding binary format.
It's interesting to note that kernel versions 2.1.23 and newer switched to an easier and more significant ID by using the following line:
sprintf(modname, "binfmt-%04x", *(unsigned short *)(&bprm->buf));
This new ID string represents the third and fourth byte of the binary file and is hexadecimal instead of decimal (thus leading to strings with a better format and no ugly “minus-minus” appearing now and then.
While calling images by name can be funny, it has no real role in a computer system. I personally prefer calling my viewer by name, and I do not believe in the object-orientedness of the approach. This kind of feature in my opinion is best suited to the file manager where it can be tailored by appropriate configuration files without introducing kernel bloat to lie in the way of any computational path.
What is really interesting about binary formats is the ability to run program files that don't fall in the handy #! notation. This includes executable files belonging to other operating systems or platforms, as well as interpreted languages that have not been designed for the Unix operating system—all those languages that complain about a #! in the first line.
If you want to play one such game, you can try the fail module. This “Format for Automatically Interpreting Lisp” is a wrapper to invoke Emacs any time a byte-compiled e-lisp program is invoked by name. Such practice is definitely failure-prone, as it makes little sense to invoke several megabytes of program code to run a few lines of lisp. Moreover, Emacs-lisp is not suited to command-line handling. Together with fail you'll also find a pair of sample lisp executables to make your tests.
A real-world Linux system is full of interesting examples of interpreted binary formats such as the Java binary format. Other examples are the binary format that allows the Alpha platform to run Linux-x86 binaries and the one included in recent DOSEMU distributions that is able to run old DOS programs transparently (although the program must be specifically tailored in advance).
Version 2.1.43 of the kernel and newer ones include generic support for interpreted binary formats. binfmt_misc is somewhat like bloom but much more powerful. You can add new interpreted binary formats to the module by writing the relevant information to the file /proc/sys/fs/binfmt_misc.
Listing 1 and all other programs referred to in this article are available by anonymous download in the file ftp.linuxjournal.com/pub/lj/listings/issue45/2568.tgz.