Optimization in GCC

Here's what the O options mean in GCC, why some optimizations aren't optimal after all and how you can make specialized optimization choices for your application.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

hi, follow the "Listing 3.

Anonymous's picture

hi, follow the "Listing 3. Simple Example of gprof" but when using -O or -O2, the profile is "Flat profile".So how to resoult it?

my step is:
1: gcc -o test_optimization test_optimization.c -pg -march=i386
2: ./test_optimization
3: gprof --no-graph -b ./test_optimization gmon.out
4: the result is:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 1 0.00 0.00 factorial

if add -O2, the result is:
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated

% cumulative self self total
time seconds seconds calls Ts/call Ts/call name

single optimization flag without level

Anonymous's picture

Any optimization can be enabled outside of any level simply by specifying its name with the -f prefix, as: gcc -fdefer-pop -o test test.c

In current versions of GCC it is incorrect ( http://gcc.gnu.org/wiki/FAQ#optimization-options ). Single optimization flag without optimization level doesn't work. I don't know what about old versions.

gcc 4.2.3 vs visual c 2005

nanjil's picture

I just compiled a code under gcc cygwin and visual c 2005 in a lpatop with dula core intel processor.

The debuggable gcc code was about 2x times than faster than visual c++ debuggable code

however the situation reversed when i used O3 optimization in gcc and "release" optimization in visual c.

now the visual c code is 2x faster than gcc.

i did not expect that large a difference; it is HUGE!!
am i missin gsomehting or anybody else has noticed similar thing?

visual c++ optimizations

Anonymous's picture

apparentely, MSVC uses a few insecure optimizations counting that the developer created a secure code. Probably thats why its debug build is slower.

I've seen lots of situations where gcc code gives a error right away, and promptly showing me and bug and MSVC happily executing a code until it finally stumble upon a non-static field of a class and finally giving a error. For me , this is simple misleading and thats why I prefer gcc

Detailed article, that is great!

Anonymous's picture

You wrote very detailedly!
It is really useful for me right now since I am doing my thesis work on optimization under Linux. Thank your so much!

Someone should write some

Anonymous's picture

Someone should write some "C" code and a few scripts that will enable / disable every compiler option and then print out which options worked best for _your_ particular system.

A benchmark that would specifically test each option (as opposed to using a single benchmark, and huge) could be written.

EG: no point in benchmarking if we should use:
gcc -O2 -O3 code.c -- One disables the other

gcc -fno-gcse SSE2_code.c

Benchmarks need to have a 'large' effect on the option that is being switched.

This could be ran overnight (or on multiple machines, each doing part of the testing) and results provided on a web page somewhere.

Experts could put in thier two cents and a wiki of snipperts could
be fed into a code compilator (not compiler, just a bunch of scripts) that would compilate all the snippets and produce a final program to be compiled on many different machines.

This way we could figure out that if we had such-and-such a system then "how-often" (what % of the time) would we simply be better off
to use a particular option and when is it more likely based on that TYPE of program we are running (wordprocessor vs. MultiMedia app).

EG: If you have a Pentium is is ALWAYS (or should be if gcc is correct) best to use the -march=pentium option - BUT - it is NOT always best to use "-fcrossjumping" (though it _could_ be for certain applications).

The output of all this could simply be a half dozen command line choices for each processor - including a "general purpose 'best'" setting and a "quick compile with great optimization" setting (for intermediate builds).

This is something that a few dozen people need to work on to get the ball rolling and then the rest of us need to pitch in and compile the resulting test scripts to check for errors. With everyone's help we should have the so-called answer(S) to "which compilation options should I use for machine-X when compiling applcation=category Y.

Just a thought ...

Looks like you have a good

Anonymous's picture

Looks like you have a good project to setup now.

Got Table?

Anonymous's picture

Where can I get a readable copy of Table 1? The copy here is too small to read, and can't be enlarged.

try clicking on it

Anonymous's picture

try clicking on it

-O versus -O0

Anonymous's picture

Minor comment -- -O defaults to -O1, not to -O0.