Optimization in GCC
Earlier we used the time command to identify how much time was spent in a given command. This can be useful, but when we're profiling our application, we need more insight into the image. The gprof utility provided by GNU and the GCC compiler meets this need. Full coverage of gprof is outside the scope of this article, but Listing 3 illustrates its use.
Listing 3. Simple Example of gprof
[mtj@camus]$ gcc -o sort sort.c -pg -O2 -march=pentium2 [mtj@camus]$ ./sort [mtj@camus]$ gprof --no-graph -b ./sort gmon.out Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 100.00 0.79 0.79 1 790.00 790.00 bubbleSort 0.00 0.79 0.00 1 0.00 0.00 init_list [mtj@camus]$
The image is compiled with the -pg option to include profiling instructions in the image. Upon execution of the image, a gmon.out file results that can be used with the gprof utility to produce human-readable profiling data. In this use of gprof, we specify the -b and --no-graph options. For brief output (excludes the verbose field explanations), we specify -b. The --no-graph option disables the emission of the function call-graph; it identifies which functions call which others and the time spent on each.
Reading the example from Listing 3, we can see that bubbleSort was called once and took 790ms. The init_list function also was called, but it took less than 10ms to complete (the resolution of the profile sampling), so its value was zero.
If we're more interested in changes in the size of the object than speed, we can use the size command. For more specific information, we can use the objdump utility. To see a list of the functions in our object, we can search for the .text sections, as in:
objdump -x sort | grep .text
From this short list, we can identify the particular function we're interested in understanding better.
The GCC optimizer is essentially a black box. Options and optimization flags are specified, and the resulting code may or may not improve. When they do improve, what exactly happened within the resulting code? This question can be answered by looking at the resulting code.
To emit target instructions from the compiler, the -S option can be specified, such as:
gcc -c -S test.c
which tells gcc to compile the source only (-c) but also to emit assembly code for the source (-S). The resulting assembly output will be contained in the file test.s.
The disadvantage of the previous approach is you see only assembly code, no aspect of the size of the actual instructions is given. For this, we can use objdump to emit both assembly and native instructions, like so:
gcc -c -g test.c objdump -d test.o
For gcc, we specify compile with only -c, but we also want to include debug information in the object (-g). Using objdump, we specify the -d option to disassemble the instructions in the object. Finally, we can get assembly-interspersed source listings with:
gcc -c -g -Wa,-ahl,-L test.c
This command uses the GNU assembler to emit the listing. The -Wa option is used to pass the -ahl and -L options to the assembler to emit a listing to standard-out that contains the high-level source and assembly. The -L option retains the local symbols in the symbol table.
All applications are different, so there's no magic configuration of optimization and option switches that yield the best result. The simplest way to achieve good performance is to rely on the -O2 optimization level; if you're not interested in portability, specify the target architecture using -march=. For space-constrained applications, the -Os optimization level should be considered first. If you're interested in squeezing the most performance out of your application, your best bet is to try out the different levels and then use the various utilities to check the resulting code. Enabling and/or disabling certain optimizations also may help exploit the optimizer to receive the best performance.
Resources for this article: www.linuxjournal.com/article/7971.
M. Tim Jones (email@example.com) is a senior principal engineer with Emulex Corp. in Longmont, Colorado. In addition to being an embedded firmware engineer, Tim recently finished writing the book BSD Sockets Programming from a Multilanguage Perspective. He has written kernels for communications and research satellites and now develops embedded firmware for networking products.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Validate an E-Mail Address with PHP, the Right Way
- Technical Support Rep
- Senior Perl Developer
- UX Designer
- Introduction to MapReduce with Hadoop on Linux
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?