An Introduction to Metaprogramming

How to write programs that write programs.

A metaprogram is a program that generates other programs or program parts. Hence, metaprogramming means writing metaprograms. Many useful metaprograms are available for Linux; the most common ones include compilers (GCC or FORTRAN 77), interpreters (Perl or Ruby), parser generators (Bison), assemblers (AS or NASM) and preprocessors (CPP or M4). Typically, you use a metaprogram to eliminate or reduce a tedious or error-prone programming task. So, for example, instead of writing a machine code program by hand, you would use a high-level language, such as C, and then let the C compiler do the translation to the equivalent low-level machine instructions.

Metaprogramming at first may seem to be an advanced topic, suitable only for programming language gurus, but it's not really that difficult once you know how to use the adequate tools.

Source Code Generation

In order to present a very simple example of metaprogramming, let's assume the following totally fictional situation.

Erika is a very smart first-year undergraduate computer science student. She already knows several programming languages, including C and Ruby. During her introductory programming class, Professor Gomez, the course instructor, caught her chatting on her laptop computer. As punishment, he demanded Erika write a C program that printed the following 1,000 lines of text:

1. I must not chat in class.
2. I must not chat in class.
...
999. I must not chat in class.
1000. I must not chat in class.

An additional imposed restriction was that the program could not use any kind of loop or goto instruction. It should contain only one big main function with 1,000 printf instructions—something like this:


#include <stdio.h>
int main(void) {
    printf("1. I must not chat in class.\n");
    printf("2. I must not chat in class.\n");

    /* 996 printf instructions omitted. */

    printf("999. I must not chat in class.\n");
    printf("1000. I must not chat in class.\n");
    return 0;
}

Professor Gomez wasn't too naive, so he basically expected Erika to write the printf instruction once, copy it to the clipboard, do 999 pastes, and manually change the numbers. He expected that even this amount of irksome and repetitive work would be enough to teach her a lesson. But, Erika immediately saw an easy way out—metaprogramming. Instead of writing this program by hand, why not write another program that writes this program automatically for her? So, she wrote the following Ruby script:


File.open('punishment.c', 'w') do |output|
  output.puts '#include <stdio.h>'
  output.puts 'int main(void) {'
  1.upto(1000) do |i|
    output.puts "    printf(\"#{i}. " +
      "I must not chat in class.\\n\");"
  end
  output.puts '    return 0;'
  output.puts '}'
end

This code creates a file called punishment.c with the expected 1,000+ lines of C source code.

Although this example might seem a bit fabricated, it illustrates how easy it is to write a program that produces the source of another program. This technique can be used in more realistic settings. Let's say that you have a C program that needs to include a PNG image, but for some reason, the deployment platform can accept one file only, the executable file. Thus, the data that conforms the PNG file data has to be integrated within the program code itself. To achieve this, we can read the PNG file beforehand and generate the C source text for an array declaration, initialized with the corresponding data as literal values. This Ruby script does exactly that:


INPUT_FILE_NAME = 'ljlogo.png'
OUTPUT_FILE_NAME = 'ljlogo.h'
DATA_VARIABLE_NAME = 'ljlogo'

File.open(INPUT_FILE_NAME, 'r') do |input|
  File.open(OUTPUT_FILE_NAME, 'w') do |output|
    output.print "unsigned char #{DATA_VARIABLE_NAME}[] = {"
    data = input.read.unpack('C*')
    data.length.times do |i|
      if i % 8 == 0
        output.print "\n    "
      end
      output.print '0x%02X' % data[i]
      output.print ', ' if i < data.length - 1
    end
    output.puts "\n};"
  end
end

This script reads the file called ljlogo.png and creates a new output file called ljlogo.h. First, it writes the declaration of the variable ljlogo as an array of unsigned characters. Next, it reads the whole input file at once and unpacks every single input character as an unsigned byte. Then, it writes each of the input bytes as two-digit hexadecimal numbers in groups of eight elements per line. As should be expected, individual elements are terminated with commas, except the last one. Finally, the script writes the closing brace and semicolon. Here is a possible output file sample:

unsigned char ljlogo[] = {
    0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A,
    0x00, 0x00, 0x00, 0x0D, 0x49, 0x48, 0x44, 0x52,

    /* A few hundred lines omitted. */

    0x0B, 0x13, 0x00, 0x00, 0x00, 0x00, 0x49, 0x45,
    0x4E, 0x44, 0xAE, 0x42, 0x60, 0x82
};

The following C program demonstrates how you could use the generated code as an ordinary C header file. It's important to note that the PNG file data will be stored in memory when the program itself is loaded:


#include <stdio.h>
#include "ljlogo.h"

/* Prints the contents of the array ljlogo as
   hexadecimal byte values. */
int main(void) {
    int i;
    for (i = 0; i < sizeof(ljlogo); i++) {
        printf("%X ", ljlogo[i]);
    }
    return 0;
}

You also can have a program that both generates source code and executes it on the spot. Some languages have a facility called eval, which allows you to translate and execute a piece of source code contained within a string of characters at runtime. This feature is usually available in interpreted languages, such as Lisp, Perl, Ruby, Python and JavaScript. In this Ruby code:

x = 3
s = 'x + 1'
puts eval(s)

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix