A Beginner's Guide to Compiling Source Code

Reviews

by Larry Ayers

on December 1, 1996

One of the first things a newcomer to Linux often does is search the Internet for interesting and useful programs to run, quickly discovering that many programs are available only in the form of a source-code tree (a form that can be intimidating if one isn't a programmer). For this reason, the new Linux user needs to become comfortable with the compilation process, as it truly is a necessary skill for anyone running Linux.

I'm not a programmer myself, but I do have a basic knowledge of how source code becomes an executable file. Possibly my non-programming status will enable me to bring up information which might seem “too obvious for words” to the experienced programmer. A good introduction to the subject is chapter six of Running Linux by Matt Welsh and Lar Kaufman (O'Reilly, 1995).

Compilation Software

The GNU programming utilities, including the gcc compiler, the make program, the linker and a slew of related tools (many of which you don't necessarily need to know about) are an integral part of most Linux distributions. The Slackware distribution has a menu-driven installation during which you are given the option of having the GNU programming tools installed. If you elected not to install these packages, you will have to start up the pkgtool utility and have them copied to your hard disk.

There are other free compilers out there, but it is advisable to stick with the GNU tools, as they are well-maintained and of high quality. Most Linux software seems to be written with gcc in mind as well, and the less editing of the supplied Makefiles you have to do the better off you'll be.

Applications written in the popular Tcl/Tk programming languages don't generally use the GNU tools; if they do, the C-language components are subsidiary to the Tcl/Tk components. You need to have the Tcl and Tk libraries and executables installed on your system in order to install the source for this type of application. These applications aren't compiled in the usual sense. Installation consists of copying the the Tcl and Tk files to directories specified in the makefile. These programs are completely dependent on their ability to access an existing Tcl/Tk installed base of files, one of the most important of which is the Tk “wish” executable.

As recently as a couple of months ago, it was difficult to maintain a current Tcl/Tk installation; development was rapid, binaries weren't always available and the packages could be difficult to compile successfully. Some of the newer applications required the beta libraries to function. The situation has stabilized recently with the release of the non-beta Tcl-7.5 and Tk-4.1 in both binary and source versions. For these programs most users are better off installing the binaries since, in my experience, they can be difficult to compile from source.

Note that even if you have a reasonably current Linux distribution, the Tcl/Tk versions included may very well be out of date. If you want to run the latest versions of such exemplary applications as TkDesk and TkMan it is well worthwhile to upgrade your Tcl/Tk files.

Obtaining Source Code

FTP sites can't really be called user-friendly or inviting to newcomers. The file names are often cryptic, and navigating through seemingly infinite levels of a directory tree can be frustrating, especially if you don't know where the file is located. These sites aren't designed for casual browsing, but the maintainers of the large archive sites (e.g., ftp://sunsite.unc.edu and its mirrors) keep the various index files, sometimes in HTML format, which list the available files with brief descriptions. Usually a file called NEW exists which lists recent arrivals. The complete index files can be very large, but are worth downloading in order to be able to use a text editor with a good search facility to search for keywords or file names which interest you.

In general, a file called filename.tar.gz will usually be a source code directory tree, tarred and gzipped. The binary distributions usually have a name patterned as filename.BIN.tar.gz, or filename.BIN-ELF.tar.gz.

Usenet postings in the various Linux newsgroups often contain locations of various packages.

I recommend using NcFtp as an FTP client.This well-written interface to the command-line FTP program has many convenient features, such as remembering every site you visit in a bookmarks file, including the directory you were last in. This feature meshes well with NcFtp's “reget” function, which allows resumption of interrupted file transfers at the point the connection was broken.

Another handy resource is a recent release of a CD-ROM containing a snapshot of one of the major Linux FTP archive sites. Several companies market these CD-ROMs and they are reasonably priced. Linux software changes so quickly that the files on even the most recent of these CD-ROMs will probably be back-level by a version or two, but if you have a sudden desire to compile Xemacs or the Andrew User Interface System, a CD-ROM will save you a lengthy download.

Dealing with *.tgz Files

NcFtp can be easily configured to deposit downloaded files in /usr/local/src, or wherever you like. Once you have your file it must be unzipped and untarred. Setting up an alias in your ~/.bashrc file (if you use bash) simplifies this process. As an example, the line

alias tgz='tar xvzf'

in .bashrc will allow you to expand an archive by typing tgz filename.tar.gz. The great majority of archive files will create a subdirectory, then expand the files and subdirectories of the archive inside of it. Every now and then you will run across one which expands right into the current directory. You can either list the contents of the archive first (tar tzvf filename.tar.gz), or create a directory and move the files into it (if you've already expanded it).

There are filemanagers available that can expedite these processes. The Midnight Commander text-mode utility can treat *.tgz files as virtual directories, allowing you to dive into them and inspect the contents (read the readme files) without actually expanding the archive.

The Tcl/Tk file/desktop manager TkDesk has right-mouse-button menus specific to archive files; they allow you to list the contents in an editor window and extract to the current directory or the root directory.

One way or another you'll end up with a directory tree containing the source code. There is a useful Unix convention by which files that should be read before attempting the compilation have names in all-caps, e.g., README or CHANGES. There is usually an important file named INSTALL that should be read closely. Since capitalized file names are displayed at the top of a directory listing, these files are easy to find.

Three Types of Source

Source code packages can be broadly categorized into three types: programs which include a Configure script, programs using imake, and programs with a default makefile.

We'll start with the easiest type, the first mentioned above. Configure scripts are marvelous constructs—basically, they are shell programs that wander at will throughout your Linux system, checking for the presence of various libraries and header files. The scripts use this information to build a proto-makefile, converting it into a Makefile customized to your system. I've found that programs using these scripts compile very easily. After the script does its work, it's usually just a matter of typing make, then make install when the process is complete.

Many source packages use the imake program, usually via its shell script interface xmkmf. These packages will have an Imakefile and a makefile.in. The compilation begins with the invocation of imake, usually by typing xmkmf. Imake is a C preprocessor; it generates your Makefile using makefile.in as a template and information stored in various templates and macro files, usually located in /usr/X11R6/lib/X11/config. Fortunately, you don't need to understand how this works to use it. The Imakefile is the only file you should need to modify, usually just by setting preferred installation paths and the locations of some vital libraries.

The simplest, but most problematic, type of source package has a default Makefile included. This Makefile will have to be carefully edited in order to ensure that the proper libraries are included. Sometimes, especially if the source was written for a machine very like yours, these packages will compile with minimal Makefile-editing. But they can also bomb, so spend a little time trying a few things, and know when the time spent isn't worth the dubious results.

Luckily, the most useful and popular programs tend to compile easily, as a result of more people being involved and submitting bug-reports. Many free programs are available that have makefiles for a wide variety of operating systems, including OS/2, Windows NT—even DOS.

The Compilation Process

Much can be learned by watching the status messages from your compiler scroll down the screen. Each *.c file in the source directory is first compiled into an object file (*.o). In this process the human-readable ASCII text source file is converted into binary format. This phase is the most time-consuming. If you're still watching near the end of the process, you will see a dramatic flurry of activity as the object files are linked together, and the shared libraries the finished executable needs in order to run are linked as well. Then the process abruptly stops. gcc doesn't tell you that it compiled the executable successfully. It will, however, tell you if there were errors and it can't finish the compile. It's always a kick to do a quick ls after the compile, to verify that there actually is a shiny, new, never-before-executed program waiting to be tried.

In my experience, most fatal errors involve the library-linking step. A little common sense helps here; make sure you have the library, and that it is in a location known about by ldd, the library loader. Sometimes the problem is a missing symbolic link, linking a lib in a nonstandard location to one of the normal library directories. If the version of the library you have is outdated or wrong, gcc will say so in the error message.

The Xpm libraries can be a source of compilation problems. There are quite a few versions out there, and some programs are picky about accepting them. If you upgrade to a newer Xpm lib in order to get something to compile, don't be surprised if some of your older X applications stop working. I have yet to figure out a way to have more than one version of Xpm active at the same time. As finicky as Xpm is, it has become a vital part of many X programs. I'm beginning to realize what motivated people to create coherent, upgradeable Linux distributions.

A Few Hints

In several instances I've failed miserably with a source distribution, then weeks or months later downloaded a later version and had it compile cleanly. Perhaps I updated a library the Makefile was looking for, or perhaps the author made a change in the source which fortuitously caused the program to be compatible with my system. In other words, it's worthwhile to try again later when you initially have a problem.

Another situation I've found myself in: after several edits of the Makefile and perhaps a few header files I'm getting more and more compiler errors. Nothing seems to work and I can't seem to make any headway. This is an ideal time to delete the entire directory tree and reinstall it from the archive file. Sometimes a completely fresh start helps.

One compiler flag to watch out for in the Makefile is -g (as in gcc -g). The GNU programs often have this flag, which instructs the compiler to add bulky debugging code to the executable. This is needed if you plan to use a debugger on the program. I don't even have a debugger installed, so I routinely remove that flag. The strip utility will remove this debugging code, often reducing an executable to half its original size.

Virtual consoles are tailor-made for compiling. Once you've set a lengthy compilation in motion, just switch to another console and start something else. I like to shut down X-Windows while compiling, as gcc uses all of the processor cycles it can get. The more resources that are available, the faster your program will compile.

Conclusion

So what do you gain from learning to compile programs?

The range of software available to you is considerably increased.
I believe there is an advantage to using an executable tuned to your system and configuration.
You have the opportunity to specify compiler flags, like >\#140>O2, to optimize the code. Sometimes there are compile-time options that can be set or unset in the Makefile.
Functions or subroutines in the program you know you will never need can be left out of the executable.
Source code is often the only form in which successive builds are available in beta-testing scenarios.
Often more complete documentation will be included with source code than with a binary distribution.
It is interesting to get glimpses into the way programs are put together. Often source files are heavily commented, because the programmer might want to explain sections of code to present or future collaborators in the project.

Larry Ayers (layers@vax2.rain.gen.mo.us) lives on a small farm in norther Missouri, where he is currently engaged in building a timber-frame house for his family. He operates a portable band-saw mill, does general woodworking, plays the fiddle and searches for rare prairie plants, as well as growing shiitake mushrooms. He is also struggling with configuring a Usenet news server for his local ISP.

Load Disqus comments