gprof, bprof and Time Profilers
A friend of mine was explaining to me why he thought his program wasn't running fast enough. “Have you profiled it?” I asked. “No, but I'm pretty sure I know where the bottleneck is,” he replied—famous last words. “Well, let's try the profiler,” I said. Profiling quickly revealed that 98% of the CPU was being spent in one subroutine of my friend's program and that 86% of the CPU was being spent in one line, and he was wrong about the location of the bottleneck.
Profilers are invaluable tools that let you know where a program is spending most of its time. This information is extremely valuable because it tells you where your time can best be spent in making your program more efficient and where you are wasting your time. Typical programs are not quite as lopsided as in the above story. The “80-20” rule says that a program will spend 80% of its time in about 20% of the code.
If the total running time of a program is n, then we can break up the running time into pieces:
total-time = a1*n + a2*n + a3*n + ...
where the “a”s represent fractions of the total time that your program spends in a particular segment. The sum of the “a”s must add to one. The 80-20 rule says that one of the “a”s will be quite large. For example, suppose a1 is 8/10 and a2 is 2/10. These numbers correspond to a program which spends 80% of its time doing a1 and 20% for everything else.
total-time = .8*n + .2*nNow suppose we optimize a2 so that it runs twice as fast as before—a significant speedup. The time 0.2*n is now 0.1*n. The total running time is now 0.8n+0.1n = 0.9n, meaning the whole program executes in 90% of the time that it originally did. Suppose we instead concentrate on the other piece. If we halve the running time of the first piece, it becomes 0.4n. 0.4n+0.2n=0.6n, or 60% of the original running time. As you can see, it is worth our while to concentrate on the particular portion of the program that dominates the runtime.
In my friend's case, we were able to optimize that single line a bit. The real optimization came two days later when he told me that he had removed the whole subroutine in question, simply by changing how he thought about his data.
The easiest profiler to use under Linux is the gprof profiler. gprof is a standard part of the GNU development tools. If you have gcc installed, you probably have gprof too. To use gprof, simply recompile your program with gcc using the -pg switch. This option causes gcc to insert a bit of extra code into the beginning of each subroutine in your program. The -pg switch must also be used when you link your program, since another snippet of code must be present to tie the pieces together.
After recompiling, run your program. It will execute slightly slower because of the work needed to profile the code, but it shouldn't be too slow. After the program finishes, there will be a file named gmon.out in the current directory. This file contains the profiling information collected during the program's run. gprof is used to print this in a human readable form.
gprof outputs information in two ways: a flat profile and a call graph. The flat profile tells you how much time the program spent in all of the subroutines, and the call graph tells you which subroutines called which subroutines.
The first part of a flat profile is shown in Listing 1. The “self seconds” column shows how much time was taken up by each subroutine. The number of times each subroutine was invoked is shown in the “calls” column. The “self ms/call” columns gives the average time in milliseconds spent in a given subroutine, while the “total ms/call” includes time spent in subroutines called by that subroutine as well. For reasons explained in the gprof documentation, this last column is actually only a guess and should not be relied upon.
In this example, an important thing to note is that the mcount subroutine is the actual profiling subroutine call inserted by gcc when it compiles code with the -pg switch. The fact that the program spends nearly 20% of its time here indicates that a lot of calling and returning is happening, and that one way to speed up the code would be to eliminate some of the subroutine calls. Which ones? The subroutines rnd and uni are likely candidates, since they are called 142 million times in 900 seconds.
The other profiler in common use on Linux is bprof. The major difference between bprof and gprof is that bprof gives timings on a source line basis while gprof has only subroutine-level resolution, and also includes information like invocation counts. To use bprof, link an object file, bprof.o, into your program. After you've run your program, a file named bmon.out contains the timing information. Run bprof on this data file, and it makes copies of your source files with timing numbers prepended to each line.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Dynamic DNS—an Object Lesson in Problem Solving
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- Tech Tip: Really Simple HTTP Server with Python
- Roll your own dynamic dns
20 min 29 sec ago - Please correct the URL for Salt Stack's web site
3 hours 31 min ago - Android is Linux -- why no better inter-operation
5 hours 47 min ago - Connecting Android device to desktop Linux via USB
6 hours 15 min ago - Find new cell phone and tablet pc
7 hours 13 min ago - Epistle
8 hours 42 min ago - Automatically updating Guest Additions
9 hours 51 min ago - I like your topic on android
10 hours 37 min ago - This is the easiest tutorial
17 hours 13 min ago - Ahh, the Koolaid.
22 hours 51 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
progress has been made
"The easiest profiler to use under Linux is the gprof profiler"
We've come a long way since 1998. Check out Zoom - http://www.rotateright.com or even oprofile.
> The operation of
> The operation of referencing the array consists of multiplying
> i times a scaling factor (which is implemented as binary left shifts),
> adding this value to the start of array[] and fetching from that location.
> Replacing this code with:
Any CPU since the 68000 has had addressing modes like (reg1*constant + reg2) which makes the array index and pointer cases strictly identical, unless the pointer saves you a register. Is your friend coding on 8-bit cpu or something ?
Flat Profile
Hi,
We are using Flat profile in order to analyse our program. But the calls column for all the functions does not show any value. We are unable to figure out what is the problem. Please tell us a solution as quickly as possible. Thanks in advance.
Thanks and Regards,
V Varun.
error on running gprof
I am using gprof for profiling my code compiled using gcc in Linux.
the code runs fine and generates gmon.out
but when i give gprof it is giving error
"gprof: datalogger(which is the name of my executable):not in a.out format"
what can be the possible reason?
anybody please suggest.
Gprof
Hi,
I am trying to profile my code on Linux using Gprof
However it is showing incomplete data
Also it is showing the cumulative sec and self sec as all 0.00
I tried profileing in HP 11i there gprof is doing name mangling
Thanks
Nidhi