Garbage Collection in C Programs

LISP and Java programmers take garbage collection for granted. With the Boehm-Demers-Weiser library, you easily can use it in C and C++ projects, too.
Performance and Behavior

A single best approach to memory management that is effective for any program does not exist. Given a specific application, the optimal solution must be found by compromising on a number of different factors, including CPU overhead, heap expansion, allocation latency and, last but not least, manageability and robustness of the code. Profiling the program and testing different memory strategies probably is the best solution for dealing with these issues.

Garbage Collection and Compiler Optimizations

One subtle point against GC is it requires extra care if compiler optimizations are switched on. The collector may wrongly assume a certain pointer has disappeared simply because references to it have been optimized out. Thus, the corresponding memory block might be freed even if it still is in use by the program, with the obvious consequences. Hence, it might be tempting to turn compiler optimizations off to be safe, losing part of the performance gain obtained by using GC.

In order to have an idea of the behavior and performance of GC vs. traditional memory management, we have experimented with a test program, gctest, which loops over the creation and destruction of a simple list. Simple as it may seem, the test raises some interesting points. The source code is not really instructive and is too long to be printed, so it is available for download on the FTP site (

gctest can be controlled with a number of options that allow you to experiment with different list lengths and node sizes, as well as to change working parameters—enabling and disabling specific features of the GC library. Before we comment on the results we obtained with this test tool, it is important to point out that they were obtained in an artificial and not excessively realistic environment. So, again, you are invited to test the GC library yourself and evaluate it for your own code. Take the parameters presented here as possible indicators of the library suitability to a particular application.

The measurements we collected are overall execution time, the CPU load; heap expansion, how much memory is requested by the system with respect to how much actually was allocated by the program; and allocation latency average and standard deviation, how long it takes to allocate a single block of memory and how much this time varies across different allocations. The meaning of the first parameter is quite obvious and needs no further explanation. Heap expansion is a measure of how much of the allocated memory is fragmented and what amount of extra memory is requested by the library from the operating system. As we will see, the library may allocate a 1MB block ten times, freeing it after each allocation and requesting a total of 10MB of the system, as if freed memory was not put back on the free list. Although this sometimes is desired behavior, needed to optimize the allocation strategy, it can be annoying on systems with a limited availability of RAM. It can become a further source of CPU overhead if swap space is involved. Finally, allocation latency is important for real-time applications, which need the longest allocation time to be bounded. Typical cases are multimedia playback applications and specialized embedded systems that need to react to external events in a predictable amount of time.

Some Comments on the Results

Our test box A is a Pentium 4 2.53GHz system, with 1GB of RAM running Gentoo Linux (all code is optimized for the CPU architecture) and glibc 2.3, which has an improved memory management algorithm over glibc 2.2. Test box B is a K6-II 400MHz laptop with 128MB of RAM, running Slackware Linux with glibc 2.2.

Our first test consisted of allocating a list of 150,000 nodes, 16 bytes each, 30 times. On each loop, the allocated list was destroyed, that is, free()ed in case of traditional management, unlinked in case of GC. The test commands were:

gctest -tu -s 4 -n 150000 -l 30


gctest -gu -s 4 -n 150000 -l 30

The overall execution time, on box A, was 3.80 seconds with traditional management and 2.43 seconds with GC, an improvement of about 35%. The same test performed on box B showed an even greater improvement, around 40%. This first test shows that, contrary to popular belief, GC actually can be quite faster than malloc/free. Heap expansion is rather limited and amounts to about 2 in both cases. What is even more surprising is that allocation latency is the same—6.7 microseconds—with a slightly larger deviation for the GC case. Also interesting is that by calling GC_gcollect() at each loop (option -G), the overall execution time decreases by 0.1 second. This result is counterintuitive, because we have one more function call in the loop.

Now, let's see what happens if we forget to destroy the list at the end of each loop. In the traditional management case, the test executes faster, 2.58 vs. 3.80 seconds, but the peak heap expansion is 140MB, which is twice the overall allocated memory. In the GC case, the test aborts due to memory exhaustion unless we call for an explicit collection (-G) at the end of each loop. By doing so, we obtain the lowest execution time, 2.32 seconds. This probably is quite far from what we could have imagined a priori—that's why actual experimentation is important for finding the optimal solutions.

The same test also has been performed on box B, but with an unoptimized Slackware distribution and glibc 2.2. It is interesting that although the improvement of GC over malloc/free still was around 40%, the test ran 27% faster under Gentoo.

The second test we made shows some limitations of the GC library. The test conditions actually were quite extreme: we tried to allocate five lists, each having 1,500,000 nodes, with each node being 16 bytes long. Although the malloc/free version ran correctly, the GC version did not complete the test because of memory exhaustion. The problem probably is due to the large number of blocks consecutively allocated.

The third test used larger nodes, 140 bytes each, and a shorter list length, 150,000 nodes. We ran:

gctest -tu -s 128 -n 150000 -l 5