CGI Programming

 in
A quick guide to how CGI works and why it's a nightmare to debug.

How often has this happened to you: You finish a wonderful CGI program, place it on your server, fire it up, and...much to your chagrin, you find your browser telling you that there has been a “server error”, rather than the expected output from the program?

There is no shame in having this sort of thing happen; experienced CGI programmers see this message plenty of times each week. And while experience might reduce the number of errors a programmer makes, more often it teaches him or her how to deal with such problems.

This month, we will look at all the things that need to be in place so that a web server can run your program. These problems tend to cause programmers an awful lot of trouble, in part because they change the programming paradigm we are used to from our other programming experience.

We Don't Run Things Directly

Most programs are run directly, invoked from the command line (or a graphical user interface), and then displayed on your screen. But CGI programs are different; they are almost always run on a different computer from the one at which the user is sitting. Worse, CGI programs are twice-removed from the user—the user tells the browser what he or she wants to do; then the browser has to pass that data to the server, which then passes the request (and any arguments) along to the program. Thus part of the secret of debugging CGI programs simply involves understanding that the server is invoking the program.

Just like all of the other programs on your system, CGI programs must be executable. This isn't a big deal to remember when you are coding in C and C++, since gcc and other compilers produce an executable file that has the correct execution bits turned on. But for those of us who write most of our programs in Perl (and similar languages, such as Tcl and Python, which make use of the Unix shell's interpreter), we need to make sure that the programs are executable before they will run. This might sound obvious, but believe me—from the e-mail I get, I can assure you that many people writing and installing CGI programs forget to make sure that their program is readable and executable not only to themselves, but also by the user ID under which the web server is running.

So, if you are getting an error message when running your CGI program, go to the directory in which it is located, and use the Unix “chmod” command to make it executable. Since you want to make it executable to all users on the system—a good idea, given that web servers often run under the “nobody” ID—we can issue the following command:

chmod a+x progname

Thus if my program is called “progname”, we can double-check to make sure that the program is executable by issuing the ls command with the -l option, indicating that it should provide information in the “long listing format”. This means that we should like to see information about the file's permissions, ownership, and modification dates, as well as its name. If we list our program, we should see something like:

-rwxrwxrwx 1 reuven reuven 12779 Sep 5 15:28 progname

The above means that “progname” is owned by the user named “reuven” and the group named “reuven” (not unusual on my personal Linux box!), that it was last modified on September 5th at 3:28 p.m., and that it is 12779 bytes long. But most important is the fact that we see the string “rwx” repeated three times—first for the user (“reuven” in this case), then for the user's group (again, “reuven” in this case), and then for everyone else on the system. Thus the three instances of “rwx” mean that anyone on the system can read, write, and execute progname, which is precisely how we want things to be.

Running from the Command Line

If the program is executable and still gives you problems, your next step should be to run the program from the command line. As I said earlier, we always have to remember that the web server (most likely a different machine than ours) is executing the program on our behalf, which means that we usually see not the program's output, but rather the results of what our browser does with that output! It's generally easiest to debug when we know exactly what the program is doing, and thus it can only help to execute the program on the web server itself.

So if the program is called “progname”, change into its directory (using the cd command) and type:

        ./progname

Why precede the program's name with the ./? Because your shell's PATH variable usually won't—and probably shouldn't—include your system's CGI directory, which means that were you to simply type the program's name, your PATH might not work. Unless you already include ., a.k.a the current directory, in your PATH, which is extremely convenient but also a potential security risk. Suffice it to say that the above recipe always works, while leaving off the ./ won't always do the trick.

When you execute the program, what do you see? Most important of all is that a MIME header be the first thing the program sends to its output. Browsers depend on MIME headers to know what sort of data the program will be sending its way, and without such a header, you will almost always get an error from your browser. Why? Because if the browser expects to receive a MIME header, but instead receives an HTML-formatted response from your program, it will usually not know what to do. Rather than display data incorrectly, your browser simply says (in its own terse way) that it didn't understand what sort of data it should expect. It's up to you to make sure that under all circumstances, anything sent to standard output is preceded by a MIME header.

While there isn't enough space to go into the intricacies of MIME headers this month, suffice it to say that unless your program is going to output something other than HTML, you should always make sure that the first thing it prints is:

Content-type: text/html

Also note that this string should be followed by two newline characters (\n in C and Perl), rather than the one that we are used to placing after text strings. That is, Perl programmers should write:

print "Content-type: text/html\n\n";

while those of you (wisely) using the CGI.pm module in Perl should be able to do the following:

use CGI;         # Bring in the CGI module
my $query = new CGI; # Create an instance of CGI
print $query->header("text/html"); # Output the header

Why two newline characters? Simply put, those two newlines separate your program's headers (i.e., information about the response) from the content that it returns (i.e., the response itself). There is at least one major precedent for this system that you probably use all of the time—e-mail! If you ever look at an e-mail message as transmitted via SMTP (the Internet's Simple Mail Transfer Protocol), you will see that the message's headers and contents are separated by a blank line.

What else should you look for when running your program from the command line? If your program is invoked using the GET method (i.e., if it is invoked as though it were a document, without any arguments beyond perhaps some on the command line), then invoking it from the command line should produce some output, even if the output will be a bit weird because of the lack of any input arguments. However, if your program looks at the environment variable QUERY_STRING as a source of input, then you can easilyset that value from the Unix shell. If you're using bash or some similar sh/ksh variant on your system, then you should be able to type:

export QUERY_STRING="hello+there"
  ./program-name

Note that while you don't have to write characters in QUERY_STRING in the Web's “percent-hex” encoding scheme (in which odd characters are replaced by a percent sign, followed by the hexadecimal number representing their ASCII code), I suggest that you do, if only to test your program in an environment as close as possible to the one in which it will be invoked. Remember that if we want to know why things are going wrong, we need to pretend to be the web server invoking the program—otherwise we might miss out on a subtle bug that has to do with the encoding (or lack thereof).

Testing out programs that use POST is a bit trickier, since the name/value pairs are difficult to emulate when you are invoking the software on your own. But luckily for those of us using Perl, the CGI.pm module allows us to run our programs from the command line without too much trouble. If you invoke a program that uses CGI.pm, you'll see:

  ./program-name
(offline mode: enter name=value pairs on standard
input)

At this point, you can enter all of the names and values of the elements in the HTML form that is supposed to invoke the CGI program, followed by control-D. Of course, you don't necessarily have to enter all of the form elements, particularly if you're testing a large form—after all, debugging is supposed to save you time! But it is certainly convenient to be able to type:

name=Reuven
email=reuven@NetVision.net.il
address=17+Disraeli+Street
equation=2%2B2%3D4
<control-D>

and see the results of my program with that input, rather than having to add lots of print statements to my program—a tried-and-true system, and one which I certainly use, but this is often much easier. Note that in the above examples, my address has been entered with plus signs between the words (since the space character cannot be transmitted as part of a URL), and with the plus sign (+) and equals sign (=) encoded in percent-hex format.

Since I'm already talking about Perl a bit, let me remind all of you Perl programmers to use the -w flag when running your programs! The warnings it produces can be invaluable, especially when coupled with the “diagnostics” package that describes problems in more detail. You should also seriously weigh using the “strict” package, which forces you to declare variables a bit more formally than many Perl programmers are used to doing—but the trade-off is that the Perl compiler is then able to identify potential problems as it is looking at the program. Thus most of my CGI programs have the following in their first few lines:

/usr/local/bin/perl5 -w
use strict;         # Interpret variable and
                    # subroutine names strictly
use diagnostics;    # Display documentation
                    # regarding each warning
                    # and error.
use CGI;            # Bring in the CGI module
______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix