Garbage Collection in C Programs

LISP and Java programmers take garbage collection for granted. With the Boehm-Demers-Weiser library, you easily can use it in C and C++ projects, too.

into the appropriate include files. This code substitutes only the explicit calls contained in your code, leaving startup and library allocations to traditional malloc/free calls.

A different approach is to hook malloc and friends to functions of your own, which in turn would call the GC versions. Listing 1 shows how to do it, and it can be linked directly to an existing program. See my article “Advanced Memory Allocation” [LJ, May 2003] for details on these hooks. With this method, any heap allocation is guaranteed to go through libgc, even if it is not performed directly by your code.

As a third alternative, you can pass --enable-redirect-malloc to configure before compiling the libgc library. Doing so provides the library with wrapper functions that have the same names as the standard glibc malloc family. When linking with your code, the functions in libgc override the standard ones, with a net effect similar to using malloc hooks. In this case, though, the effect is system-wise, as any program linked with libgc is affected by the change.

Do not expect to endow huge programs with GC easily using any of these methods. Some simple tricks are needed in order to exploit GC functions and help the collector algorithm work efficiently. For example, I tried to recompile gawk (version 3.1.1) using GC and obtained an executable ten times slower than the original. With some adjustments, such as setting each pointer to NULL after having freed it, the execution time improved significantly, even if it was still greater than the original time.

Garbage Collection in New Programs

If you are developing a new program and would like to take advantage of automated memory management, all you need to do is use the GC_malloc() family in place of the plain malloc() one and link with libgc. Memory blocks no longer needed simply can be disposed of by setting any referencing pointers to NULL. Alternatively, you can call GC_free() to free the block immediately.

Always remember that your whole heap is scanned periodically by the collector to look for unused blocks. If the heap is large, this operation may take some time, causing the performance of the program to degrade. This behavior is suboptimal, because large blocks of memory often are guaranteed never to contain pointers, including buffers used for file or network I/O and large strings. Typically, pointers are contained only in fixed positions within small data structures, such as list and tree nodes. Were C and C++ strongly typed languages, the collector could have decided whether to scan a memory block, based on the type of pointer. Unfortunately, this is not possible because it is perfectly legal in C to have a char pointer reference a list node.

For optimal performance, the programmer should try to provide some basic runtime type information to the collector. To this end, the BDW library has a set of alternative functions that can be used to allocate memory. GC_malloc_atomic() can be used in place of GC_malloc() to obtain memory blocks that will never contain valid pointers. That is, the collector skips those blocks when looking for live memory references. Furthermore, those blocks do not need to be cleared on allocation. GC_malloc_uncollectable() and GC_malloc_stubborn() also can be used to allocate fixed and rarely changing blocks, respectively. Finally, it is possible to provide some rough type information by using GC_malloc_explicitly_typed() and building block maps with GC_make_descriptor(). See gc_typed.h on the Linux Journal FTP site for more information [available at].

The collector's behavior also can be controlled by the user through a number of function calls and variables. Among the most useful ones are GC_gcollect(), which forces a full garbage collection on the whole heap; GC_enable_incremental(), which enables incremental mode collection; and GC_free_space_divisor, which tunes the trade-off between frequent collections (high values, causing low heap expansion and high CPU overhead) and time efficiency (low values).

Heap status and debug information is available through a number of functions, including GC_get_heap_size(), GC_get_free_bytes(), GC_get_bytes_since_gc(), GC_get_total_bytes() and GC_dump(). Many of these parameters and functions are not documented at all, not even in the source code itself. As always, a good editor is your friend.