Compile C Faster on Linux

Many people who love the GNU gcc compiler still think that it is too slow in normal use, or that it uses too much memory.

lcc is a small, fast C compiler now available on Linux. A perfectly good C compiler, gcc, comes with Linux. Why would anyone bother installing a second one? Because the two compilers make different tradeoffs, so they suit different stages of the development cycle. gcc has many targets and users, and it includes an ambitious optimizer. lcc is 75% smaller (more when counting source code), compiles more quickly, and helps prevent some porting bugs.

For those who have always wanted to customize or extend their compiler, our recent book, A Retargetable C Compiler: Design and Implementation, tours lcc's source code in detail and thus offers especially thorough documentation. Pointers to lcc's source code, executables, book, and authors appear at the end of this article.

Speed Tradeoffs

lcc is fast. gcc implements a more ambitious global code optimizer, so it emits better code, particularly with full optimization options, but global optimization takes time and space. lcc implements a few low-cost, high-yield optimizations that collaborate to yield respectable code in a hurry.

For example, lcc compiles itself in 36 seconds on a 90 megahertz Pentium running Linux. gcc takes 68 seconds to compile the same program (the lcc source) with the default compiler options, and 130 seconds with the highest level of optimization. Code quality varied less. gcc's default code took 36 seconds to reprocess this input, just like lcc's code. gcc's best code (that is, with optimization level 3) runs in 30 seconds, about 20% faster. This is only a single data point, and both compilers evolve constantly, so your mileage may vary. Naturally, one can save time by using lcc for development and optimizing with gcc for the final release build.

Porting Code

Indeed, compiling code with two different compilers helps expose portability bugs. If a program is useful and if the source code is available, sooner or later someone will try to port it to another machine, or compile it with another compiler, or both. With a new machine or compiler, glitches are not uncommon. Which of the following solutions will net you less unwanted e-mail? For you to find and erase these blots while the code is fresh in your mind? Or for the porter to get diagnostics much later, about non-standard source code?

lcc follows the ANSI standard faithfully and implements no extension. Indeed, one option directs lcc to warn about a variety of C constructs that are valid but give undefined results, and thus can behave differently on a different machine or with a different compiler. Some programmers use lcc mainly for its strict-ANSI option, which helps them keep their code portable.

Cross Compiling

Like gcc, lcc can be configured as a cross-compiler that runs on one machine and compiles code for another. Cross-compilers can simplify life for programmers with multiple target platforms. lcc takes this notion a step further than most cross-compilers: we can, and typically do, link code generators for several machines into each version of the compiler.

For example, we maintain code generators for the MIPS, SPARC, and X86 architectures. We both work on and generate code for multiple platforms, so it's handy to be able to generate code for any target from any machine. We usually fold all three code generators into all compiler executables. A run-time option tells lcc which target to generate code for. If you don't maintain code for multiple targets, you're free to use an lcc that includes just one code generator, saving roughly 50KB for each code generator omitted.

A Compact Compiler

lcc is small. lcc's Linux executable with one code generator is 232 KB, and its text segment is 192 KB. Both figures for the corresponding phase of gcc (cc1) exceed a megabyte. lcc's small size contributes to its speed, especially on modest systems. A compact program benefits those who wish to modify the compiler. Most developers will use pre-built executables for lcc; they will never examine or even recompile the source code. But the Linux community particularly prizes the availability of source code, partly because it allows users to customize their programs or adapt them for other purposes.

When configured with the Linux PC code generator lcc is 12,000 lines of C source code. gcc's root directory—without the target-specific description files—holds 240,000 lines. Surely, some of this material is not part of the compiler proper, but the separation is not immediately apparent to those who haven't browsed gcc's source recently. The machine-specific module is the part most often changed, because new target machines come along more often than, say, new source languages. The lcc target-specific module for the Linux PC is 1200 lines, and half of that repeats boilerplate declarations or supports the debugger, so the actual code generator is under 600 lines. The target-specific modules for gcc average about 3000 lines. These comparisons illustrate the fact that the two compilers embody different trade-offs and that neither beats the other at everything: gcc can emit better code and offers many options, while lcc is easier to comprehend but is otherwise less ambitious.

gcc and lcc use retargetable code generators driven in part by formal specifications of the target machine, just as a parser can be driven by a formal grammar of its input language. gcc's code generator is based in part on techniques that one of us (Fraser) originated in the late 1970's. lcc uses a different technique that is simpler but somewhat less flexible.