Examining the compilation process. part 2.
In my last article, I discussed, in quite some detail, the process that GCC uses to convert a C source file into an executable program file. These steps included preprocessing the source to remove comments, include other files as required, and string substitution. The resulting file was then compiled into assembly language. The assembly language output was then used to create an object file containing machine language, which was then linked with other standardized libraries to create an executable.
As mentioned in the previous article, that article as well as this one, are based on a software development class I taught a few years ago. Some people are going to find this to be quite a dry subject; others will be delighted to see some of the magic that the compiler performs on our creations in order to make them executable. I happened to fall into the later category and I hope you do to.
And so, last time I had concluded my article with a very light discussion of the linking process. I intend to go a bit deeper into the linking process in this article, as well as some discussion about some of the optimizations that GCC can perform for you.
Before we get too deep into things, let's see a quick example of what the linking process does for us. For this example, we have two files, main.c and funct.c
main.c:
#include <stdio.h>
extern void funct();
int main () {
funct();
}
Yes, this is a pretty simple program. Notice that we've not defined the function, funct(), only declared it as an external function that accepts no parameters and returns no value. We will define this function in the next file, funct.c:
void funct () {
puts("Hello World.");
}
Most of you, by now, see where this is headed. This is the proverbial “Hello World” program, only we've broken it up into two separate files, for the sake of instruction. In a real project, you'd use the make program to arrange for all of the files to be compiled, but we're going to do the compilation by hand.
First we compile the main.c into a main.o file:
gcc main.c -c -o main.o
This command tells GCC to compile the source file, but not to run the linker so that we are left with an object file, which we want named main.o.
Compiling funct.c is much the same:
gcc funct.c -c -o funct.o
Now we can call GCC one more time, only this time, we want it to run the linker:
gcc main.o funct.o -o hello
In this example, we supplied the names of a couple “.o” object files, requested that they all be linked, and that the resulting executable be named hello.
Would you be surprised if executing ./hello resulted in “Hello World.”?
I didn't think so. So why would we take the simplest program possible and split it into two separate files? Well, because we can. And what we gain from doing it this way is that if we make a change to only one of the files, we don't have to recompile any of the files that didn't change; we simply re-link the already existing object files to the new object file that we created when we compiled the source file that we changed. This is where the make utility comes in handy as it keeps track of what needs to be recompiled based on what files have been changed since the last compilation.
Essentially, let's say that we had a very large software project. We could write it as one file and simply recompile it as needed. However, this would make it difficult for more than one person to work on the project, as only one of us could work at a given time. Also, it would mean that the compilation process would be quite time consuming since it would have to compile several thousands of lines of C source. But if we split the project into several smaller files, more than one person can work on the project and we only have to compile those files that get changed.
The Linux linker is pretty powerful. The linker is capable of linking object files together, as in the example above. It's also able to create shared libraries that can be loaded into our program at run time. While we won't discuss the creation of shared libraries, we will see a few examples that the system already has.
In my last article, I used a source file called test.c for the sake of discussion:
#include <stdio.h>
// This is a comment.
#define STRING "This is a test"
#define COUNT (5)
int main () {
int i;
for (i=0; i<COUNT; i++) {
puts(STRING);
}
return 1;
}
We can compile this program with:
gcc test.c -o test
We can use the ldd command to get a list of shared libraries that our program depends upon.
ldd test
And we see:
linux-gate.so.1 => (0xffffe000)
libc.so.6 => /lib/libc.so.6 (0xb7e3c000)
/lib/ld-linux.so.2 (0xb7f9a000)
The libc.so.6 entry is fairly easy. It's the standard C library that contains such things as puts() and printf(). We can also see which file provides this library, /lib/libc.so.6. The other two are a bit more interesting. The ld-linux.so.2 is a library that finds and load all of the other shared libraries, that a program needs in order to run, such as the libc mentioned earlier. The Linux-gate.so.1 entry is also interesting. This library is actually just a virtual library created by the Linux kernel that lets a program know how to make system calls. Some systems support the sysenter mechanism, while others call system calls via the interrupt mechanism, which is considerably slower. We'll be talking about system calls next.
System calls are a standardized interface for interacting with the operating system. Long story short, how do you allocate memory? How do you output a string to the console? How do you read a file? These functions are provided by system calls. Let's take a closer look.
We can see what function calls a program uses by using the strace command. For example, let's take a look at our test program above with the strace program.
strace ./test
This command results in output similar to what we see below except that I've added line numbers for the sake of convenient reference.
1 execve("./test", ["./test"], [/* 56 vars */]) = 0
2 brk(0) = 0x804b000
3 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
4 open("/etc/ld.so.cache", O_RDONLY) = 3
5 fstat64(3, {st_mode=S_IFREG|0644, st_size=149783, ...}) = 0
6 mmap2(NULL, 149783, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f79000
7 close(3) = 0
8 open("/lib/libc.so.6", O_RDONLY)= 3
9 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\220g\1\0004\0\0\0"..., 512) = 512
10 fstat64(3, {st_mode=S_IFREG|0755, st_size=1265948, ...}) = 0
11 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f78000
12 mmap2(NULL, 1271376, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e41000
13 mmap2(0xb7f72000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x131) = 0xb7f72000
14 mmap2(0xb7f75000, 9808, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f75000
15 close(3) = 0
16 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e40000
17 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e406c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
18 mprotect(0xb7f72000, 8192, PROT_READ) = 0
19 mprotect(0x8049000, 4096, PROT_READ)= 0
20 mprotect(0xb7fb9000, 4096, PROT_READ) = 0
21 munmap(0xb7f79000, 149783) = 0
22 fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0
23 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f9d000
24 write(1, "This is a test\n", 15This is a test
25 )= 15
26 write(1, "This is a test\n", 15This is a test
27 )= 15
28 write(1, "This is a test\n", 15This is a test
29 )= 15
30 write(1, "This is a test\n", 15This is a test
31 )= 15
32 write(1, "This is a test\n", 15This is a test
Lines 1 and 2 are simply the calls needed by the shell execute an external command. In ines 3 through 8 we see the system trying to load various shared libraries. Line 8 shows is where the system tries to load libc. Line 9 shows us the results of the first reading of the library file. In lines8-15, we see the system mapping the contents of the libc file into memory. This is how the system loads a library into memory for use by our program; it simply reads the file into memory and gives our program a pointer to the block of memory where the library got loaded. Now our program can call functions in libc as though they were a part of our program.
Line 22 is where the system allocates a tty to send it's output to.
Finally, we see our output getting sent out in lines 24-32. The strace command lets us see what what our program is doing under the hood. It's good for learning about the system, as in this article, as well as helping to find what a misbehaved program is attempting to do. I've had numerous occasions to run strace on a program that had apparently “locked up” only to find that it was blocking on some sort of file read or such. Strace is a sure fire way of locating those types of problems.
Finally, GCC supports various levels of optimization, and I'd like to discuss just what that means.
Let's take a look at another program, test1.c:
#include <stdio.h>
int main () {
int i;
for(i=0;i<4;i++) {
puts("Hello");
}
return 0;
}
When we convert that to assembly language with the gcc -s command, we get:
.file "t2.c"
.section .rodata
.LC0:
.string "Hello"
.text
.globl main
.type main, @function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $20, %esp
movl $0, -8(%ebp)
jmp .L2
.L3:
movl $.LC0, (%esp)
call puts
addl $1, -8(%ebp)
.L2:
cmpl $3, -8(%ebp)
jle .L3
movl $0, %eax
addl $20, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (GNU) 4.2.4 (Gentoo 4.2.4 p1.0)"
.section .note.GNU-stack,"",@progbits
We can see the for loop starting at .L3:. It runs until the jle instruction right after .L2. Now let's compile this program into assembly language, but with 03 optimization turned on:
gcc -S -O3 test.c
What we get is:
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $4, %esp
movl $.LC0, (%esp)
call puts
movl $.LC0, (%esp)
call puts
movl $.LC0, (%esp)
call puts
movl $.LC0, (%esp)
call puts
movl $.LC0, (%esp)
call puts
addl $4, %esp
movl $1, %eax
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (GNU) 4.2.4 (Gentoo 4.2.4 p1.0)"
.section .note.GNU-stack,"",@progbits
Here we can see that the for loop has been completely factored out and that gcc has replaced it with 5 separate calls to the puts system call. The entire for loop is gone! Nice.
GCC is an extremely sophisticated compiler that is even capable of factoring out loop invariants. Consider this code snippet:
for (i=0; i<5; i++) {
x=23;
do_something();
}
If you write a quick program to exercise this code snippet, you will see that the assignment to the x variable gets factored to a point outside of the for loop, as long as the value of x isn't used inside the loop. Essentially, GCC, with -O3, rewrites the code into this:
x=23;
for (i=0; i<5; i++) {
do_something();
}
Very nice.
Bonus points for anyone who can guess what gcc -O3 does to this program:
#include <stdio.h>
int main () {
int i;
int j;
for(i=0;i<4;i++) {
j=j+2;
}
return 0;
}
Na, I always hated bonus questions, so I'll just give you the answer. GCC factors our program out completely. Since it does nothing, GCC doesn't write anything. Here is the output of that program:
.file "t3.c"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
xorl %eax, %eax
pushl %ebp
movl %esp, %ebp
pushl %ecx
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (GNU) 4.2.4 (Gentoo 4.2.4 p1.0)"
.section .note.GNU-stack,"",@progbits
As you can see, the program starts up and immediately terminates. The for loop is gone, as well as the assignment to the j variable. Very, very nice.
So, GCC is a very sophisticated compiler that is capable of handling very large projects and performing some very sophisticated optimizations on a given source file. I hope that reading this article, and the one before it, has lead you to a greater appreciation of just how intelligent the Linux compiler suite actually is, as well as given you some understanding that you can use to debug your own programs.
Mike Diehl is a freelance Computer Nerd specializing in Linux administration, programing, and VoIP. Mike lives in Albuquerque, NM. with his wife and 3 sons. He can be reached at mdiehl@diehlnet.com
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Home, My Backup Data Center
- A Topic for Discussion - Open Source Feature-Richness?
- What's the tweeting protocol?
- Dart: a New Web Programming Experience
- Developer Poll
- Trying to Tame the Tablet
- Reply to comment | Linux Journal
20 min 3 sec ago - Reply to comment | Linux Journal
2 hours 52 min ago - Reply to comment | Linux Journal
4 hours 9 min ago - great post
4 hours 44 min ago - Google Docs
5 hours 7 min ago - Reply to comment | Linux Journal
9 hours 55 min ago - Reply to comment | Linux Journal
10 hours 42 min ago - Web Hosting IQ
12 hours 16 min ago - Thanks for taking the time to
13 hours 53 min ago - Linux is good
15 hours 50 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.



Comments
I would like to recommend you very useful file search
Hello I have fond a lot of useful and interesting files on http://newfileengine.com/
visit and enjoy it!
thanks!
nice articles.. thank you.. have learnt a lot....
amazing
Thanks for the articles. I learned a lot reading it.
Any other article in this subject is very welcome!! =)
Congratulations
Thanks
Great series, easy to follow.
Wish you had included a similarly, simple discussion of "make".
Thanks
I read part 1 and 2 and I learned a lot. So thank you! I want to learn the Assambly language, can you advise me a good book or website? I have Assembly (Intel 32-bits) at University but it's like:
add eax,eax
And not like %eax, so I guess this is some kind of different Assembly?
Grtz
These are just syntax
These are just syntax flavours for x86 Assembly.
Intel:
add eax, eaxAT&T:
add %eax, %eaxPlease note that the order (destination and origin) is dest-orig in Intel, and orig-dest in AT&T. For the Assembly book, go to "The Art of Assembly Programming", by Randall Hyde.
Re: Thanks
Thank you for the kind words.
Unless you really need it for a particular project, I wouldn't recommend learning Assembly. As you can see from this article, the gcc C compiler is very efficient.
As to the %eax in the instruction that you cited, that's just a different addressing mode. Like I said, my Assembly skills are a bit rusty, though.
Mike Diehl
There's a problem with your
There's a problem with your HTML tags. They're not displaying correctly, at least not in Firefox 3.0.3 on Ubuntu. Change your carets by < and > .