Optimization in GCC
In this article, we explore the optimization levels provided by the GCC compiler toolchain, including the specific optimizations provided in each. We also identify optimizations that require explicit specifications, including some with architecture dependencies. This discussion focuses on the 3.2.2 version of gcc (released February 2003), but it also applies to the current release, 3.3.2.
Let's first look at how GCC categorizes optimizations and how a developer can control which are used and, sometimes more important, which are not. A large variety of optimizations are provided by GCC. Most are categorized into one of three levels, but some are provided at multiple levels. Some optimizations reduce the size of the resulting machine code, while others try to create code that is faster, potentially increasing its size. For completeness, the default optimization level is zero, which provides no optimization at all. This can be explicitly specified with option -O or -O0.
The purpose of the first level of optimization is to produce an optimized image in a short amount of time. These optimizations typically don't require significant amounts of compile time to complete. Level 1 also has two sometimes conflicting goals. These goals are to reduce the size of the compiled code while increasing its performance. The set of optimizations provided in -O1 support these goals, in most cases. These are shown in Table 1 in the column labeled -O1. The first level of optimization is enabled as:
gcc -O1 -o test test.c
Any optimization can be enabled outside of any level simply by specifying its name with the -f prefix, as:
gcc -fdefer-pop -o test test.c
We also could enable level 1 optimization and then disable any particular optimization using the -fno- prefix, like this:
gcc -O1 -fno-defer-pop -o test test.c
This command would enable the first level of optimization and then specifically disable the defer-pop optimization.
The second level of optimization performs all other supported optimizations within the given architecture that do not involve a space-speed trade-off, a balance between the two objectives. For example, loop unrolling and function inlining, which have the effect of increasing code size while also potentially making the code faster, are not performed. The second level is enabled as:
gcc -O2 -o test test.c
Table 1 shows the level -O2 optimizations. The level -O2 optimizations include all of the -O1 optimizations, plus a large number of others.
The special optimization level (-Os or size) enables all -O2 optimizations that do not increase code size; it puts the emphasis on size over speed. This includes all second-level optimizations, except for the alignment optimizations. The alignment optimizations skip space to align functions, loops, jumps and labels to an address that is a multiple of a power of two, in an architecture-dependent manner. Skipping to these boundaries can increase performance as well as the size of the resulting code and data spaces; therefore, these particular optimizations are disabled. The size optimization level is enabled as:
gcc -Os -o test test.c
In gcc 3.2.2, reorder-blocks is enabled at -Os, but in gcc 3.3.2 reorder-blocks is disabled.
The third and highest level enables even more optimizations (Table 1) by putting emphasis on speed over size. This includes optimizations enabled at -O2 and rename-register. The optimization inline-functions also is enabled here, which can increase performance but also can drastically increase the size of the object, depending upon the functions that are inlined. The third level is enabled as:
gcc -O3 -o test test.c
Although -O3 can produce fast code, the increase in the size of the image can have adverse effects on its speed. For example, if the size of the image exceeds the size of the available instruction cache, severe performance penalties can be observed. Therefore, it may be better simply to compile at -O2 to increase the chances that the image fits in the instruction cache.
|Bitcoin on Amazon! Sort of...||Sep 28, 2016|
|Free Today: September Issue of Linux Journal (Retail value: $5.99)||Sep 27, 2016|
|nginx||Sep 27, 2016|
|Epiq Solutions' Sidekiq M.2||Sep 26, 2016|
|Nativ Disc||Sep 23, 2016|
|Android Browser Security--What You Haven't Been Told||Sep 22, 2016|
- Free Today: September Issue of Linux Journal (Retail value: $5.99)
- Bitcoin on Amazon! Sort of...
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Android Browser Security--What You Haven't Been Told
- Nativ Disc
- Epiq Solutions' Sidekiq M.2
- Identity: Our Last Stand
- The Many Paths to a Solution
- Securing the Programmer
Pick up any e-commerce web or mobile app today, and you’ll be holding a mashup of interconnected applications and services from a variety of different providers. For instance, when you connect to Amazon’s e-commerce app, cookies, tags and pixels that are monitored by solutions like Exact Target, BazaarVoice, Bing, Shopzilla, Liveramp and Google Tag Manager track every action you take. You’re presented with special offers and coupons based on your viewing and buying patterns. If you find something you want for your birthday, a third party manages your wish list, which you can share through multiple social- media outlets or email to a friend. When you select something to buy, you find yourself presented with similar items as kind suggestions. And when you finally check out, you’re offered the ability to pay with promo codes, gifts cards, PayPal or a variety of credit cards.Get the Guide