Examining the compilation process. part 2.

In my last article, I discussed, in quite some detail, the process that GCC uses to convert a C source file into an executable program file. These steps included preprocessing the source to remove comments, include other files as required, and string substitution. The resulting file was then compiled into assembly language. The assembly language output was then used to create an object file containing machine language, which was then linked with other standardized libraries to create an executable.

As mentioned in the previous article, that article as well as this one, are based on a software development class I taught a few years ago. Some people are going to find this to be quite a dry subject; others will be delighted to see some of the magic that the compiler performs on our creations in order to make them executable. I happened to fall into the later category and I hope you do to.

And so, last time I had concluded my article with a very light discussion of the linking process. I intend to go a bit deeper into the linking process in this article, as well as some discussion about some of the optimizations that GCC can perform for you.

Before we get too deep into things, let's see a quick example of what the linking process does for us. For this example, we have two files, main.c and funct.c

main.c:
#include <stdio.h>

extern void     funct();

int     main () {
        funct();
}

Yes, this is a pretty simple program. Notice that we've not defined the function, funct(), only declared it as an external function that accepts no parameters and returns no value. We will define this function in the next file, funct.c:

void    funct () {
        puts("Hello World.");
}

Most of you, by now, see where this is headed. This is the proverbial “Hello World” program, only we've broken it up into two separate files, for the sake of instruction. In a real project, you'd use the make program to arrange for all of the files to be compiled, but we're going to do the compilation by hand.

First we compile the main.c into a main.o file:

gcc main.c -c -o main.o

This command tells GCC to compile the source file, but not to run the linker so that we are left with an object file, which we want named main.o.

Compiling funct.c is much the same:

gcc funct.c -c -o funct.o

Now we can call GCC one more time, only this time, we want it to run the linker:

gcc main.o funct.o -o hello

In this example, we supplied the names of a couple “.o” object files, requested that they all be linked, and that the resulting executable be named hello.

Would you be surprised if executing ./hello resulted in “Hello World.”?

I didn't think so. So why would we take the simplest program possible and split it into two separate files? Well, because we can. And what we gain from doing it this way is that if we make a change to only one of the files, we don't have to recompile any of the files that didn't change; we simply re-link the already existing object files to the new object file that we created when we compiled the source file that we changed. This is where the make utility comes in handy as it keeps track of what needs to be recompiled based on what files have been changed since the last compilation.

Essentially, let's say that we had a very large software project. We could write it as one file and simply recompile it as needed. However, this would make it difficult for more than one person to work on the project, as only one of us could work at a given time. Also, it would mean that the compilation process would be quite time consuming since it would have to compile several thousands of lines of C source. But if we split the project into several smaller files, more than one person can work on the project and we only have to compile those files that get changed.

The Linux linker is pretty powerful. The linker is capable of linking object files together, as in the example above. It's also able to create shared libraries that can be loaded into our program at run time. While we won't discuss the creation of shared libraries, we will see a few examples that the system already has.

In my last article, I used a source file called test.c for the sake of discussion:

#include <stdio.h>

// This is a comment.

#define STRING "This is a test"
#define COUNT (5)

int     main () {
        int i;

        for (i=0; i<COUNT; i++) {
                puts(STRING);
        }

        return 1;
}

We can compile this program with:

gcc test.c -o test

We can use the ldd command to get a list of shared libraries that our program depends upon.

ldd test

And we see:

        linux-gate.so.1 =>  (0xffffe000)
        libc.so.6 => /lib/libc.so.6 (0xb7e3c000)
        /lib/ld-linux.so.2 (0xb7f9a000)

The libc.so.6 entry is fairly easy. It's the standard C library that contains such things as puts() and printf(). We can also see which file provides this library, /lib/libc.so.6. The other two are a bit more interesting. The ld-linux.so.2 is a library that finds and load all of the other shared libraries, that a program needs in order to run, such as the libc mentioned earlier. The Linux-gate.so.1 entry is also interesting. This library is actually just a virtual library created by the Linux kernel that lets a program know how to make system calls. Some systems support the sysenter mechanism, while others call system calls via the interrupt mechanism, which is considerably slower. We'll be talking about system calls next.

System calls are a standardized interface for interacting with the operating system. Long story short, how do you allocate memory? How do you output a string to the console? How do you read a file? These functions are provided by system calls. Let's take a closer look.

We can see what function calls a program uses by using the strace command. For example, let's take a look at our test program above with the strace program.

strace ./test

This command results in output similar to what we see below except that I've added line numbers for the sake of convenient reference.

1  execve("./test", ["./test"], [/* 56 vars */]) = 0
2  brk(0)                                  = 0x804b000
3  access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
4  open("/etc/ld.so.cache", O_RDONLY)      = 3
5  fstat64(3, {st_mode=S_IFREG|0644, st_size=149783, ...}) = 0
6  mmap2(NULL, 149783, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f79000
7  close(3)                                = 0
 8  open("/lib/libc.so.6", O_RDONLY)= 3
 9  read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\220g\1\0004\0\0\0"..., 512) = 512
10  fstat64(3, {st_mode=S_IFREG|0755, st_size=1265948, ...}) = 0
11  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f78000
12  mmap2(NULL, 1271376, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e41000
13  mmap2(0xb7f72000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x131) = 0xb7f72000
14  mmap2(0xb7f75000, 9808, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f75000
15  close(3)                            = 0
    16  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e40000
17  set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e406c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
18  mprotect(0xb7f72000, 8192, PROT_READ)   = 0
19  mprotect(0x8049000, 4096, PROT_READ)= 0
20  mprotect(0xb7fb9000, 4096, PROT_READ)   = 0
21  munmap(0xb7f79000, 149783)  = 0
22  fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
23  mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f9d000
24  write(1, "This is a test\n", 15This is a test
25  )= 15
26  write(1, "This is a test\n", 15This is a test
27  )= 15
28  write(1, "This is a test\n", 15This is a test
29  )= 15
30  write(1, "This is a test\n", 15This is a test
31  )= 15
32  write(1, "This is a test\n", 15This is a test

Lines 1 and 2 are simply the calls needed by the shell execute an external command. In ines 3 through 8 we see the system trying to load various shared libraries. Line 8 shows is where the system tries to load libc. Line 9 shows us the results of the first reading of the library file. In lines8-15, we see the system mapping the contents of the libc file into memory. This is how the system loads a library into memory for use by our program; it simply reads the file into memory and gives our program a pointer to the block of memory where the library got loaded. Now our program can call functions in libc as though they were a part of our program.

Line 22 is where the system allocates a tty to send it's output to.

Finally, we see our output getting sent out in lines 24-32. The strace command lets us see what what our program is doing under the hood. It's good for learning about the system, as in this article, as well as helping to find what a misbehaved program is attempting to do. I've had numerous occasions to run strace on a program that had apparently “locked up” only to find that it was blocking on some sort of file read or such. Strace is a sure fire way of locating those types of problems.

Finally, GCC supports various levels of optimization, and I'd like to discuss just what that means.

Let's take a look at another program, test1.c:

#include <stdio.h>

int     main () {
        int i;

        for(i=0;i<4;i++) {
                puts("Hello");
        }

        return 0;
}

When we convert that to assembly language with the gcc -s command, we get:

        .file   "t2.c"
        .section        .rodata
.LC0:
        .string "Hello"
        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $20, %esp
        movl    $0, -8(%ebp)
        jmp     .L2
.L3:
        movl    $.LC0, (%esp)
        call    puts
        addl    $1, -8(%ebp)
.L2:
        cmpl    $3, -8(%ebp)
        jle     .L3
        movl    $0, %eax
        addl    $20, %esp
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 4.2.4 (Gentoo 4.2.4 p1.0)"
        .section        .note.GNU-stack,"",@progbits

We can see the for loop starting at .L3:. It runs until the jle instruction right after .L2. Now let's compile this program into assembly language, but with 03 optimization turned on:

gcc -S -O3 test.c

What we get is:

main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $4, %esp
        movl    $.LC0, (%esp)
        call    puts
        movl    $.LC0, (%esp)
        call    puts
        movl    $.LC0, (%esp)
        call    puts
        movl    $.LC0, (%esp)
        call    puts
        movl    $.LC0, (%esp)
        call    puts
        addl    $4, %esp
        movl    $1, %eax
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 4.2.4 (Gentoo 4.2.4 p1.0)"
        .section        .note.GNU-stack,"",@progbits

Here we can see that the for loop has been completely factored out and that gcc has replaced it with 5 separate calls to the puts system call. The entire for loop is gone! Nice.

GCC is an extremely sophisticated compiler that is even capable of factoring out loop invariants. Consider this code snippet:


for (i=0; i<5; i++) {
	x=23;
	do_something();
}

If you write a quick program to exercise this code snippet, you will see that the assignment to the x variable gets factored to a point outside of the for loop, as long as the value of x isn't used inside the loop. Essentially, GCC, with -O3, rewrites the code into this:

x=23;
for (i=0; i<5; i++) {
	do_something();
}

Very nice.

Bonus points for anyone who can guess what gcc -O3 does to this program:

#include <stdio.h>

int     main () {
        int i;
        int j;

        for(i=0;i<4;i++) {
                j=j+2;
        }

        return 0;
}

Na, I always hated bonus questions, so I'll just give you the answer. GCC factors our program out completely. Since it does nothing, GCC doesn't write anything. Here is the output of that program:

        .file   "t3.c"
        .text
        .p2align 4,,15
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        xorl    %eax, %eax
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 4.2.4 (Gentoo 4.2.4 p1.0)"
        .section        .note.GNU-stack,"",@progbits

As you can see, the program starts up and immediately terminates. The for loop is gone, as well as the assignment to the j variable. Very, very nice.

So, GCC is a very sophisticated compiler that is capable of handling very large projects and performing some very sophisticated optimizations on a given source file. I hope that reading this article, and the one before it, has lead you to a greater appreciation of just how intelligent the Linux compiler suite actually is, as well as given you some understanding that you can use to debug your own programs.

Load Disqus comments