Optimization in GCC
In this article, we explore the optimization levels provided by the GCC compiler toolchain, including the specific optimizations provided in each. We also identify optimizations that require explicit specifications, including some with architecture dependencies. This discussion focuses on the 3.2.2 version of gcc (released February 2003), but it also applies to the current release, 3.3.2.
Let's first look at how GCC categorizes optimizations and how a developer can control which are used and, sometimes more important, which are not. A large variety of optimizations are provided by GCC. Most are categorized into one of three levels, but some are provided at multiple levels. Some optimizations reduce the size of the resulting machine code, while others try to create code that is faster, potentially increasing its size. For completeness, the default optimization level is zero, which provides no optimization at all. This can be explicitly specified with option -O or -O0.
The purpose of the first level of optimization is to produce an optimized image in a short amount of time. These optimizations typically don't require significant amounts of compile time to complete. Level 1 also has two sometimes conflicting goals. These goals are to reduce the size of the compiled code while increasing its performance. The set of optimizations provided in -O1 support these goals, in most cases. These are shown in Table 1 in the column labeled -O1. The first level of optimization is enabled as:
gcc -O1 -o test test.c
Any optimization can be enabled outside of any level simply by specifying its name with the -f prefix, as:
gcc -fdefer-pop -o test test.c
We also could enable level 1 optimization and then disable any particular optimization using the -fno- prefix, like this:
gcc -O1 -fno-defer-pop -o test test.c
This command would enable the first level of optimization and then specifically disable the defer-pop optimization.
The second level of optimization performs all other supported optimizations within the given architecture that do not involve a space-speed trade-off, a balance between the two objectives. For example, loop unrolling and function inlining, which have the effect of increasing code size while also potentially making the code faster, are not performed. The second level is enabled as:
gcc -O2 -o test test.c
Table 1 shows the level -O2 optimizations. The level -O2 optimizations include all of the -O1 optimizations, plus a large number of others.
The special optimization level (-Os or size) enables all -O2 optimizations that do not increase code size; it puts the emphasis on size over speed. This includes all second-level optimizations, except for the alignment optimizations. The alignment optimizations skip space to align functions, loops, jumps and labels to an address that is a multiple of a power of two, in an architecture-dependent manner. Skipping to these boundaries can increase performance as well as the size of the resulting code and data spaces; therefore, these particular optimizations are disabled. The size optimization level is enabled as:
gcc -Os -o test test.c
In gcc 3.2.2, reorder-blocks is enabled at -Os, but in gcc 3.3.2 reorder-blocks is disabled.
The third and highest level enables even more optimizations (Table 1) by putting emphasis on speed over size. This includes optimizations enabled at -O2 and rename-register. The optimization inline-functions also is enabled here, which can increase performance but also can drastically increase the size of the object, depending upon the functions that are inlined. The third level is enabled as:
gcc -O3 -o test test.c
Although -O3 can produce fast code, the increase in the size of the image can have adverse effects on its speed. For example, if the size of the image exceeds the size of the available instruction cache, severe performance penalties can be observed. Therefore, it may be better simply to compile at -O2 to increase the chances that the image fits in the instruction cache.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- A Topic for Discussion - Open Source Feature-Richness?
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- The Secret Password Is...
- New Products
- myip
3 hours 6 min ago - Keeping track of IP address
4 hours 57 min ago - Roll your own dynamic dns
10 hours 11 min ago - Please correct the URL for Salt Stack's web site
13 hours 22 min ago - Android is Linux -- why no better inter-operation
15 hours 37 min ago - Connecting Android device to desktop Linux via USB
16 hours 6 min ago - Find new cell phone and tablet pc
17 hours 4 min ago - Epistle
18 hours 33 min ago - Automatically updating Guest Additions
19 hours 42 min ago - I like your topic on android
20 hours 28 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?





Comments
hi, follow the "Listing 3.
hi, follow the "Listing 3. Simple Example of gprof" but when using -O or -O2, the profile is "Flat profile".So how to resoult it?
my step is:
1: gcc -o test_optimization test_optimization.c -pg -march=i386
2: ./test_optimization
3: gprof --no-graph -b ./test_optimization gmon.out
4: the result is:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 factorial
if add -O2, the result is:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
single optimization flag without level
Any optimization can be enabled outside of any level simply by specifying its name with the -f prefix, as:
gcc -fdefer-pop -o test test.cIn current versions of GCC it is incorrect ( http://gcc.gnu.org/wiki/FAQ#optimization-options ). Single optimization flag without optimization level doesn't work. I don't know what about old versions.
gcc 4.2.3 vs visual c 2005
hello:
I just compiled a code under gcc cygwin and visual c 2005 in a lpatop with dula core intel processor.
The debuggable gcc code was about 2x times than faster than visual c++ debuggable code
however the situation reversed when i used O3 optimization in gcc and "release" optimization in visual c.
now the visual c code is 2x faster than gcc.
i did not expect that large a difference; it is HUGE!!
am i missin gsomehting or anybody else has noticed similar thing?
visual c++ optimizations
apparentely, MSVC uses a few insecure optimizations counting that the developer created a secure code. Probably thats why its debug build is slower.
I've seen lots of situations where gcc code gives a error right away, and promptly showing me and bug and MSVC happily executing a code until it finally stumble upon a non-static field of a class and finally giving a error. For me , this is simple misleading and thats why I prefer gcc
Detailed article, that is great!
You wrote very detailedly!
It is really useful for me right now since I am doing my thesis work on optimization under Linux. Thank your so much!
Someone should write some
Someone should write some "C" code and a few scripts that will enable / disable every compiler option and then print out which options worked best for _your_ particular system.
A benchmark that would specifically test each option (as opposed to using a single benchmark, and huge) could be written.
EG: no point in benchmarking if we should use:
gcc -O2 -O3 code.c -- One disables the other
gcc -fno-gcse SSE2_code.c
Benchmarks need to have a 'large' effect on the option that is being switched.
This could be ran overnight (or on multiple machines, each doing part of the testing) and results provided on a web page somewhere.
Experts could put in thier two cents and a wiki of snipperts could
be fed into a code compilator (not compiler, just a bunch of scripts) that would compilate all the snippets and produce a final program to be compiled on many different machines.
This way we could figure out that if we had such-and-such a system then "how-often" (what % of the time) would we simply be better off
to use a particular option and when is it more likely based on that TYPE of program we are running (wordprocessor vs. MultiMedia app).
EG: If you have a Pentium is is ALWAYS (or should be if gcc is correct) best to use the -march=pentium option - BUT - it is NOT always best to use "-fcrossjumping" (though it _could_ be for certain applications).
The output of all this could simply be a half dozen command line choices for each processor - including a "general purpose 'best'" setting and a "quick compile with great optimization" setting (for intermediate builds).
This is something that a few dozen people need to work on to get the ball rolling and then the rest of us need to pitch in and compile the resulting test scripts to check for errors. With everyone's help we should have the so-called answer(S) to "which compilation options should I use for machine-X when compiling applcation=category Y.
Just a thought ...
Looks like you have a good
Looks like you have a good project to setup now.
Got Table?
Where can I get a readable copy of Table 1? The copy here is too small to read, and can't be enlarged.
try clicking on it
try clicking on it
-O versus -O0
Minor comment -- -O defaults to -O1, not to -O0.