What The @#$%&! (Heck) is this #! (Hash-Bang) Thingy In My Bash Script

 

You've seen it a million times—the hash-bang (#!) line at the top of a script—whether it be Bash, Python, Perl or some other scripting language. And, I'm sure you know what its purpose is: it specifies the script interpreter that's used to execute the script. But, do you know how it actually works? Your initial thought might be that your shell (bash) reads that line and then executes the specified interpreter, but that's not at all how it works. How it actually works is the main focus of this post, but I also want to introduce how you can create your own version of "hash-bang" if you're so inclined.

When you set the executable bit on a script file and then try to execute the file, the filename is passed directly to the kernel; the shell has nothing to do with interpreting the first line in the script. The first two characters in the file (the hash and the bang) are often referred to (when combined into a single word) as the "magic number" of a script file. With this "magic number", the kernel is able to identify the file as a script, and it (the kernel) then reads the first line of the file and starts the script interpreter that's specified in the first line and passes the script filename to the intepreter.

Executable file formats are part of the "binfmt" (binary format) code in the kernel. The "binfmt" handling for scripts is found in the file binfmt_script.c, and near the bottom you'll see the following code:

static struct linux_binfmt script_format = {
    .module      = THIS_MODULE,
    .load_binary = load_script,         // <<<<< script loading function
};

static int __init init_script_binfmt(void)
{
    register_binfmt(&script_format);    // <<<<< register the binfmt
    return 0;
}

At some point when the kernel is loading, the function __init_script_binfmt() is called to initialize the "script" binfmt handler. The initialization function registers the binfmt with the kernel and specifices that the function load_script() should be called to attempt to load and execute scripts. The kernel puts all of these registered binfmts into a list called formats.

If you now look in the kernel's exec.c code, where executables are launched, you'll see code that looks like this:

int search_binary_handler(struct linux_binprm *bprm)
{
    // ...
    list_for_each_entry(fmt, &formats, lh) {
        // ...
        retval = fmt->load_binary(bprm);
        // ...
    }
    // ...
}

Here the kernel is looping through the list of registered binfmts, calling each binfmt's load function in turn until one of them recognizes the file. In the case of scripts, this calls the function load_script() that was referenced above when the script binfmt was registered. The code fmt->load_binary() is calling the load function indirectly through a pointer to the function, which is why the names are different.

If you now go back to binfmt_script.c, and find the load_script() function, you'll see code at the top of it that looks like this:

static int load_script(struct linux_binprm *bprm)
{
    // ...

    /* Not ours to exec if we don't start with "#!". */
    if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
        return -ENOEXEC;
    // ...
}

And here you can see where the code checks the first two characters of the buffer bprm->buf to see if the file starts with the characters "#" and "!" (hash and bang). If the buffer does not start with those two characters, the function returns an error, and the kernel keeps looking through the list of binfmts for a binfmt that can recognize the file. If the file does start with hash-bang, the function loads the requested interpreter, passes the script file name to it, and Bob's your uncle.

If the kernel can't find a binfmt that recognizes the file, it returns an error to the caller. Just for fun, to test this out, I tried making an image file executable:

$ chmod +x image.png
$ ./image.png

What I expected to see was an error something akin to "not an executable"; instead I got this:

$ ./image.png
./image.png: line 1: $'\211PNG\r': command not found
./image.png: line 2: $'\032': command not found
...

Which looks suspiciously like bash is trying to interpret the file. And it turns out that's exactly what is happening. If you check the bash man page you'll find this:

If this execution fails because the file is not in executable format, and the file is not a directory, it is assumed to be a shell script, a file containing shell commands.

So bash assumes that if the kernel can't execute it, it must be a shell script. Which means that you don't actually have to include "#!/bin/sh or #!/bin/bash at the top of your shell scripts if you only start them from a bash shell.

Next, I want to look at how you can create your own executable format (without modifying the Linux kernel). Let's assume that I'm a yoda coder and that I want to reverse the order of the hash-bang characters in my scripts, in other words, bang before hash prefer I (with apologies to Yoda and George Lucas). I can do this using the miscellaneous binary format "binfmt_misc", which you should be able to see a hint of with the mount command:

$ mount | grep ^binfmt_misc
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

Before creating any binfmts, the directory above contains the following:

$ ls -la /proc/sys/fs/binfmt_misc/
total 0
--w------- 1 root root 0 May  4 11:59 register
-rw-r--r-- 1 root root 0 Apr 30 06:28 status

Before I create my own binfmt, I need an "interpreter" that will get executed when I run one of my yoda scripts. For testing, I'll use the following C program, which just prints out its arguments and then copies the contents of any input files passed to it to stdout (essentially a version of the standard Linux command cat):

#include <stdio.h>

int main(int argc, char** argv)
{
    int  i;

    // Print arguments.
    for ( i = 0; i < argc; i++ ) {
        printf("Arg %d: %s\n", i, argv[i]);
    }

    // Copy files to stdout.
    for ( i = 1; i < argc; i++ ) {
        FILE*  fd = fopen(argv[i], "r");
        if ( fd ) {
            char  s[80];
            while ( fgets(s, sizeof(s)-1, fd) ) {
                fputs(s, stdout);
            }
            fclose(fd);
        }
    }
    return 0;
}

Then I compile the intepreter and place its executable in my bin directory:

$ gcc -o /home/user/bin/yoda yoda.c

To create my binfmt, I need to write a configuration line to the file /proc/sys/fs/binfmt_misc/register (the Wikipedia binfmt_misc page has good information on this configuration line):

$ su
Password: *****
# echo ':YodaFiles:M::!#::/home/user/bin/yoda:' >/proc/sys/fs/binfmt_misc/register
# exit

The "M" in the line above says this binfmt uses a Magic number, and the "!#" (bang hash) in the line specifies the magic number.

Now when I list the directory /proc/sys/fs/binfmt_misc/, I see the following:

$ ls -la /proc/sys/fs/binfmt_misc/
total 0
--w------- 1 root root 0 May  4 11:59 register
-rw-r--r-- 1 root root 0 Apr 30 06:28 status
-rw-r--r-- 1 root root 0 May  4 11:59 YodaFiles

When I look at the file for my binfmt, I can see that it references my interpreter and my magic number (2123 == !#):

$ cat /proc/sys/fs/binfmt_misc/YodaFiles
enabled
interpreter /home/user/bin/yoda
flags:
offset 0
magic 2123

And now I can set the executable bit on my "script" and have it run via my interpreter:

$ cat test.yoda
!# powerful you have become

$ chmod +x test.yoda

$ ./test.yoda
Arg 0: /home/mitch/bin/yoda
Arg 1: ./test.yoda
!# powerful you have become

I can disable and re-enable my binfmt by writing 0 and 1, respectively, to its proc file:

# echo  0 >/proc/sys/fs/binfmt_misc/YodaFiles     # disable
# echo  1 >/proc/sys/fs/binfmt_misc/YodaFiles     # re-enable

And I can delete it by writing -1 to its proc file:

# echo -1 >/proc/sys/fs/binfmt_misc/YodaFiles
# ls /proc/sys/fs/binfmt_misc/YodaFiles
ls: cannot access '/proc/sys/fs/binfmt_misc/YodaFiles': No such file or directory

If you want to create a persistent miscellanenous binfmt, you can create a configuration file for it (/etc/binfmt.d/*.conf).

Obviously, all this is of questionable use for adding interpreters for different text file formats. It's easier just to stick with the hash-bang convention and have your interpreter ignore the first line. However, if you have a binary file, that may not be an option.  Using binfmt_misc, you can associate your file with an interpreter. Note that binfmt_misc also allows you to associate a file with an interpreter based on its file extension, which comes in handy if your files don't always start with the value.

P.S. Using punctuation characters to refer to curse words is called a grawlix, with "@#$%&!" being the standard.


Any code found in my articles that is not taken from other sources, should be considered licensed as follows:

# Copyright 2019 Mitch Frazier <mitch -at- linuxjournal.com>
#
# This software may be used and distributed according to the terms of the
# MIT License or the GNU General Public License version 2 (or any later version).

Mitch Frazier is an embedded systems programmer at Emerson Electric Co. Mitch has been a contributor to and a friend of Linux Journal since the early 2000s.

Load Disqus comments