Coverage Measurement and Profiling

by Zach Frey

Maybe you've always wondered what the gcov utility that comes with GCC is used for, or maybe your new project at work has a regulatory or customer requirement that your delivered software be tested to a certain percentage of coverage, and you are looking for how to accomplish that task. In this article, I introduce the general ideas of coverage measurement and of performance profiling, along with the standard GNU tools (gcov and gprof) used in these two techniques.

Coverage measurement is the recording of what paths were executed in code. Coverage can be measured with different degrees of granularity. The coarsest level is function coverage, measuring which functions were called; then comes statement coverage, measuring which lines of code were executed; and finally, branch coverage, measuring which logic conditions in branch statements were satisfied. When someone refers to coverage measurement usually statement or branch coverage is being discussed. gcov is the standard GNU coverage measurement tool, and it requires GCC.

It's a sad fact, but slow software can outstrip Moore's Law and cheap hardware. Even if you have the latest CPU and memory to spare, it seems you always can find software to soak it up. Furthermore, when discussing resource-constrained systems, like PDAs and embedded systems, it's often not possible to solve performance problems simply by throwing more hardware at them.

Profiling allows you to measure which part of your code is taking the most time to run. It gives you a window into the actual runtime behavior of the program and lets you see what the hot spots are for optimization. These spots often are not obvious from simple code inspection or reasoning about how the program ought to behave, so a profiler is a necessary tool here.

Profiling is a superset of coverage measurement, but its intent is different. With coverage measurement, you hope to measure which parts of your code didn't run. With profiling, you hope to measure which parts of your code did run and which consumed the most time. A profiler measures the number of times a function has been called, the number of times any particular line of code or branch of logic has been executed, the call graph of which functions have called and the amount of time spent in each area of your program. Gprof is the standard GNU profiler. It also requires GCC.

Coverage: Why Bother?

Although some industries, such as aerospace, require test coverage measurement, it seems to be an underused technique outside of those areas. This is because coverage measurement is a bit more indirect than other debugging techniques, such as memory leak detection and overwrite detections. Coverage measurement in and of itself is only a measurement; it does not automatically find bugs or improve the quality of your code.

What coverage management does do is provide information on how comprehensive your tests are. If you don't test at all or don't test in some systematic way, there's no point in measuring coverage. Furthermore, if you don't have some sort of standard, automated test suite (DejaGNU, for example), collecting coverage measurement data can be so labor-intensive, ad hoc and error-prone that it might be difficult to interpret your coverage measurements meaningfully.

Even test suites that seem comprehensive can leave large gaps. On my first project ever using coverage measurement, we ran our standard regression suite, which we were very proud of, and found the percentage of lines of code exercised was 63%. If you've never done this exercise before, you may be tempted to think “63%! Your tests must not have been very good.” We actually were quite happy to have a number this high; we knew our code had a large amount of error-handling cases for system faults that our test suite did not have the ability to trigger, such as out-of-memory or out-of-file descriptors. We also had read that the industry average for coverage on a newly instrumented test suite was close to 50%, so we were glad to have done as well as we did.

Let that number sink in: 50%. An average comprehensive test suite exercises only 50% of the code it is supposed to be checking. If you are not doing coverage measurement, you have no idea if you are doing even this well—and you do not have a measurement telling you how much better you could do. It's hard to optimize something that's unmeasured.

Knowing how much of your code was exercised during your test suite is the beginning, though, not the end. Once you know this, you can look at the details of the coverage reports, find what code was not exercised and start adding to your test suite.

Using gcov for Coverage Measurement

In order to use gcov, you must use GCC as your compiler. In addition, you must pass the options -fprofile-arcs -ftest-coverage during compilation. This causes GCC to insert additional code into the created object files to record the information needed by gcov and save it to disk.

The gcov data is written in the same directory as your source files, using a .da extension instead of .c. Nothing is written until you run your program. You can run your program multiple times to accumulate measurements, for example, trying different command-line options or input data.

Per the GCC documentation, the object file contains the absolute pathname of the .da file. So, it is not possible to move your source files after building and continue to collect coverage measurements. Doing so produces the error message arc profiling: Can't open output file filename when attempting to save the coverage data. gcov also shows results for only a single file at a time, and it does not provide a summarization method.

Creating gcov reports is straightforward. I use glib (the base library for Gtk+ and GNOME, see www.gtk.org) as an example. First, download glib from the Gtk+ site or a mirror. I used glib-2.2.1, which was the latest at the time of this writing.

To turn on coverage measurement, set the environment variable CFLAGS to -fprofile-arcs -ftest-coverage and run the standard ./configure; make; make check. This should work for any software that follows the GNU guidelines.

After make check has completed, it is possible to use gcov to create coverage reports. The most basic report we can access is a statement coverage report:


[zfrey@warthog glib]$ gcov garray.c
 68.37% of 215 source lines executed in file garray.c
Creating garray.c.gcov.

Besides printing out summary statistics, gcov creates an annotated source listing. For each executable line, this listing shows either the number of times it has been executed or a flag indicating it never has been executed (Listing 1).

Listing 1. Excerpt of a gcov Annotated Source Listing


      GArray* g_array_sized_new
                  (gboolean zero_terminated,
                   gboolean clear,
                   guint    elt_size,
                   guint    reserved_size)
40    {
40      GRealArray *array;

40      G_LOCK (array_mem_chunk);
40      if (!array_mem_chunk)
14        array_mem_chunk = g_mem_chunk_new
                           ("array mem chunk",
                            sizeof (GRealArray),
                            1024, G_ALLOC_AND_FREE);

40      array = g_chunk_new (GRealArray,
                             array_mem_chunk);

To see the branch coverage, use the -b option:

[zfrey@warthog glib]$ gcov -b garray.c
 68.37% of 215 source lines executed in file garray.c
 59.02% of 122 branches executed in file garray.c
 46.72% of 122 branches taken at least once in file garray.c
 45.35% of 86 calls executed in file garray.c
Creating garray.c.gcov.

The annotated source file now contains the branch data (Listing 2).

Listing 2. Excerpt from a Source File Annotated with Branch Data


        GArray* g_array_sized_new (
                         gboolean zero_terminated,
                         gboolean clear,
                         guint    elt_size,
                         guint    reserved_size)
  40    {
  40      GRealArray *array;

  40      G_LOCK (array_mem_chunk);
branch 0 taken = 40%
branch 1 taken = 58%
call 2 returns = 100%
branch 3 taken = 100%
call 4 returns = 100%
  40      if (!array_mem_chunk)
branch 0 taken = 65%
  14        array_mem_chunk = g_mem_chunk_new (
                            "array mem chunk",
call 0 returns = 100%
                             sizeof (GRealArray),
                             1024, G_ALLOC_AND_FREE);
                             1024, G_ALLOC_AND_FREE);

  40      array = g_chunk_new (GRealArray,
                               array_mem_chunk);
call 0 returns = 100%

To see coverage broken down by function within the source file, use the -f option. This can be combined with -b to get the branch coverage data on a per-function basis as well. Unfortunately, gcov does not provide a way to summarize results over multiple source files.

GGCov: a Gtk+ GUI for gcov

GGCov is a Gtk+-based GUI front end for gcov written by Greg Banks and available at www.alphalink.com.au/~gnb/ggcov. I was able to download the latest version (0.1.1 at the time), compile and install with it no problems. This was done using ./configure; make; make install and allowing /usr/local as the prefix.

To use GGCov, cd to the directory with your source files and launch GGCov. GGCov then looks for all the files in that directory with coverage measurements and creates a summary (Figure 1). This is something you cannot do with GNU gcov by itself.

Figure 1. A GGCov Summary Report

Selecting the Filename button activates a drop-down list of all files in the directory that have coverage information. Selecting a filename returns the statistics for that file.

Figure 2. A GGCov Report by File

Selecting the Function button activates a drop-down list of all the functions from all the files in the directory that have coverage information. Selecting a function name returns the statistics for that function (Figure 3).

Figure 3. A GGCov Report by Function

As if that weren't enough power, the Range button allows the selection of a file and a line number range and creates a summary for only the functions within that range. All views except the Overall summary have a View button, which brings up a new window containing the annotated source display for the selected file.

Overall, I found the basic functionality of GGCov easy to use and helpful. GGCov also contains features for displaying call graphs; unfortunately, on Red Hat 8.0 the glib example caused GGCov to segfault. I contacted Greg Banks, who said the call graph features were the most immature and that he hopes to address these problems in the next few releases.

Using gprof for Profiling

Like gcov, gprof requires that your programs be compiled with particular options so GCC can instrument your code properly. To use gprof, you must use the -pg option. By default, this profiles each function executed. To profile line by line, also use the standard -g debugging option to insert debugging symbols that gprof can use to create line-by-line reports.

When your program is run, information about function calls and time spent is kept in memory. At program exit, this information is written to a file called gmon.out. gprof uses this file to create profiling reports. Unlike coverage monitoring, which saves data in separate files for each source file, one data file is created for the entire program. Instead of saving data in the same directory as the source file, gmon.out is written to the correct working directory at the time the program exits. So beware of chdir() calls.

By default, the profiling data counts the number of times each function is called, along with the function from which it was called. Runtime data is collected by a sampling process, where the program is examined every sample period, and the function currently executing is recorded. gprof interprets this sampling to report how much time was spent on each area of the code.

Because this is a statistical sampling, it is subject to statistical error. See the gprof manual for a detailed discussion of the error calculation. In short, the times reported by gprof may not be exact. Furthermore, if a function is executed for a relatively small amount of time, it may be missed during the sampling and show 0.0 execution time. This is not a problem for performance profiling. The information that a particular function or block of code did not run for enough time to be sampled tells you that you probably don't want to spend much time optimizing it.

Statistical errors are a problem, however, if you also want to use gprof to show statement or branch coverage in addition to performance data. For exact counts of how many times each line of code has been executed, enable basic block counting by using the -a option at compile time. This adds instrumentation to the code to count the execution of each block of code. It also allows gprof to show exactly how many times each line of code was executed, even if zero time is shown by way of sampling.

It is possible to combine multiple gmon.out files in order to build up measurements over many runs of a program. This is done with the gprof -s command, which takes any number of gmon.out files and creates a new gmon.sum with the combined data. It is possible to add profiling data incrementally by specifying gmon.sum as an input file.

I originally intended to reuse my glib example with gprof, but the different behavior of gprof and gcov made this prospect difficult. The glib test programs are run one after another by the make check command, so each test's gmon.out was overwritten by the next test. Also, the test programs actually are shell scripts, which set some reasonably complex environment variables to run the real test programs, which are kept in a hidden subdirectory. Moral of the story: gcov is the right tool for doing coverage testing for glib.

Instead, I created a simple program to calculate the Fibonacci numbers from zero to 42. The Fibonacci numbers are a series of numbers beginning with 0,1 where the next number in the series is found by adding the previous two numbers (so, 0,1,1,2,3,5,8,13,21,...) A good Web site on the Fibonacci numbers and related mathematical concepts can be found at www.mcs.surrey.ac.uk/Personal/R.Knott/Fibonacci/fib.html. The number 42 was chosen, not merely for its literary significance to Douglas Adams fans, but because the Fibonnacci numbers begin overflowing signed 32-bit integer arithmetic at 47, so a lower limit was useful.

The calculation of a Fibonacci number can make a good example of a recursive function. Because each Fibonacci number is the sum of the previous two, the simplest approach is to create a function int fibonacci(int n), which simply calls itself with n - 1 and with n - 2, adds the results and returns the value (Listing 3).

Listing 3. fib.c


#include <stdio.h>

int fibonacci(int n);

int main (int argc, char **argv)
{
   int fib;
   int n;

   for (n = 0; n <= 42; n++) {
      fib = fibonacci(n);
      printf("fibonnaci(%d) = %d\n", n, fib);
   }

   return 0;
}

int fibonacci(int n)
{
   int fib;
   if (n <= 0) {
      fib = 0;
   }
   else if (n == 1) {
      fib = 1;
   }
   else {
      fib = fibonacci(n -1) + fibonacci(n - 2);
   }

   return fib;
}

Here is the output from running this simple Fibonacci program:

[zfrey@warthog prof]$ time ./fib
fibonnaci(0) = 0
fibonnaci(1) = 1
fibonnaci(2) = 1
fibonnaci(3) = 2
fibonnaci(4) = 3
...
fibonnaci(40) = 102334155
fibonnaci(41) = 165580141
fibonnaci(42) = 267914296

real    3m12.580s
user    3m10.566s
sys     0m0.127s
[zfrey@warthog prof]$

Now, we can look at the profiling data:


[zfrey@warthog prof]$ gprof -b ./fib gmon.out
Flat profile:

Each sample counts as 0.00195312 seconds.

 %   cumulative   self              self     total
time   seconds   seconds    calls  ms/call  ms/call  name
95.13     16.34    16.34       43   380.02   380.02  fibonacci
 4.87     17.18     0.84                             main

                        Call graph

index % time    self  children    called     name

[1]    100.0    0.84   16.34                 main [1]
               16.34    0.00      43/43      fibonacci [2]
-----------------------------------------------
                             2269806252      fibonacci [2]
               16.34    0.00      43/43      main [1]
[2]     95.1   16.34    0.00   43+2269806252 fibonacci [2]
                             2269806252      fibonacci [2]
-----------------------------------------------

Index by function name

   [2] fibonacci               [1] main

It's obvious where this program's bottleneck is—the fibonnaci() function itself. That fact should be evident from reading the source. What our profiling tells us that is not obvious from the source is the number of recursion calls that our calculation takes. Although fibonacci() is called directly from main() only 43 times, we are making approximately 2.3 billion calls as part of the recursion.

Fortunately, there's another way. Checking our reference on Fibonacci numbers, we find that there is a formula to calculate the nth Fibonacci number directly from n:

F(n) = round(Phin / sqrt(5))

I am not going to explain the mathematics behind the derivation of this formula, for the simple reason that I don't understand them myself. With this improved algorithm, however, once again I can rebuild, run and profile (Listing 4).

Listing 4. New Version of the Fibonacci Function

#define PHI   1.6180339887498948

int fibonacci(int n)
{
   return (int) rint(pow(PHI, n) / sqrt(5));
}

Let's see what gprof tells us this time:

                        Call graph

granularity: each sample hit covers 4 byte(s) no time propagated

index % time    self  children    called     name
                0.00    0.00      43/43          main [8]
[1]      0.0    0.00    0.00      43         fibonacci [1]
-----------------------------------------------

Index by function name

   [1] fibonacci

Our program is now so much faster that the sampling process failed to register any time spent. Although this is a good result in a way, it's not so good if we want to see if any more performance improvements can be made.

A standard technique from the gprof manual is to run the program many times and summarize the results, like so:


[zfrey@warthog prof]$ mkdir runfib2; cd runfib2
[zfrey@warthog runfib2]$ for i in `seq 1 1000` ; \
do ../fib2 > /dev/null; mv gmon.out gmon.out.$i; done
[zfrey@warthog runfib2]$ gprof -s ../fib2 gmon.out.*

The results, though, still show the time spent as 0.00. Clearly, a different approach is called for. I now create fib3.c, with two changes. First, I enclose the 0–42 loop with an additional 0–1000 loop, to increase our runtime. Second, I add a local variable to the fibonacci() function and break the calculation apart to one operation per line for line-by-line profiling. This gives us enough runtime that gprof has something to report:

Listing 5. fib3.c


#include <stdio.h>
#include <math.h>

int fibonacci(int n);

int main (int argc, char **argv)
{
   int fib;
   int n;
   int i;

   for (i = 0; i < 1000; i++) {
      for (n = 0; n <= 42; n++) {
         fib = fibonacci(n);
         printf("fibonnaci(%d) = %d\n", n, fib);
      }
   }

   return 0;
}

#define PHI   1.6180339887498948

int fibonacci(int n)
{
   double temp;

   temp = pow(PHI,n);
   temp = temp / sqrt(5);
   return (int) rint(temp);
}

Before doing a line-by-line report, though, I create a multirun summary, as before.

Now, let's look at the line-by-line report (trimmed to fit):


[zfrey@warthog runfib3]$ gprof -b -l -p ../fib3 gmon.sum
Flat profile:

Each sample counts as 0.00195312 seconds.
 %   cumulative   self             self     total
time   seconds   seconds   calls  ps/call  ps/call  name
38.25      1.12     1.12                            fibonacci (fib3.c:31)
27.97      1.94     0.82                            fibonacci (fib3.c:30)
14.49      2.36     0.42                            fibonacci (fib3.c:29)
 5.91      2.53     0.17                            main (fib3.c:16)
 4.64      2.67     0.14                            main (fib3.c:15)
 3.14      2.76     0.09                            fibonacci (fib3.c:32)
 1.87      2.82     0.05                            main (fib3.c:14)
 1.54      2.86     0.04                            main (fib3.c:14)
 0.83      2.89     0.02 43000000   567.77   567.77 fibonacci (fib3.c:26)
 0.77      2.91     0.02                            main (fib3.c:21)
 0.53      2.92     0.02                            main (fib3.c:13)
 0.07      2.93     0.00                            main (fib3.c:20)


The highest percentage of time is spent on line 31, followed by line 30. There is nothing we can do about line 31, because the printf() call and the function return are essential operations. However, on line 30, notice that we are calculating the square root of 5 every time. Because this result never changes, the next optimization should be to replace it with a constant. This cuts the execution time from 136 milliseconds to 22 milliseconds.

KProf—A KDE GUI for gprof

KProf provides a KDE-based GUI for viewing gprof results. It is available from kprof.sourceforge.net and can be built with the standard ./configure; make; make install process.

After launching KProf, go to Filerarr;Open to open either an executable for which profiling measurements have been taken or a text file that contains a gprof report. If you open an executable, KProf assumes that the profiling data is contained in the current directory in gmon.out.

Figure 4 shows a view of KProf displaying the results from the fib example. The circular arrow by the Fibonacci results indicate the function is called recursively.

Figure 4. KProf Display

Zach Frey (zach@zfrey.com) works as an application engineer at New Eagle, a division of Mototron, in Ann Arbor, Michigan. He has more experience with software tests than he cares to admit, but he still finds improving software quality to be an interesting topic.

Load Disqus comments