Garbage Collection in C Programs
A single best approach to memory management that is effective for any program does not exist. Given a specific application, the optimal solution must be found by compromising on a number of different factors, including CPU overhead, heap expansion, allocation latency and, last but not least, manageability and robustness of the code. Profiling the program and testing different memory strategies probably is the best solution for dealing with these issues.
One subtle point against GC is it requires extra care if compiler optimizations are switched on. The collector may wrongly assume a certain pointer has disappeared simply because references to it have been optimized out. Thus, the corresponding memory block might be freed even if it still is in use by the program, with the obvious consequences. Hence, it might be tempting to turn compiler optimizations off to be safe, losing part of the performance gain obtained by using GC.
In order to have an idea of the behavior and performance of GC vs. traditional memory management, we have experimented with a test program, gctest, which loops over the creation and destruction of a simple list. Simple as it may seem, the test raises some interesting points. The source code is not really instructive and is too long to be printed, so it is available for download on the FTP site (ftp.linuxjournal.com/pub/lj/listings/issue113/6679.tgz).
gctest can be controlled with a number of options that allow you to experiment with different list lengths and node sizes, as well as to change working parameters—enabling and disabling specific features of the GC library. Before we comment on the results we obtained with this test tool, it is important to point out that they were obtained in an artificial and not excessively realistic environment. So, again, you are invited to test the GC library yourself and evaluate it for your own code. Take the parameters presented here as possible indicators of the library suitability to a particular application.
The measurements we collected are overall execution time, the CPU load; heap expansion, how much memory is requested by the system with respect to how much actually was allocated by the program; and allocation latency average and standard deviation, how long it takes to allocate a single block of memory and how much this time varies across different allocations. The meaning of the first parameter is quite obvious and needs no further explanation. Heap expansion is a measure of how much of the allocated memory is fragmented and what amount of extra memory is requested by the library from the operating system. As we will see, the library may allocate a 1MB block ten times, freeing it after each allocation and requesting a total of 10MB of the system, as if freed memory was not put back on the free list. Although this sometimes is desired behavior, needed to optimize the allocation strategy, it can be annoying on systems with a limited availability of RAM. It can become a further source of CPU overhead if swap space is involved. Finally, allocation latency is important for real-time applications, which need the longest allocation time to be bounded. Typical cases are multimedia playback applications and specialized embedded systems that need to react to external events in a predictable amount of time.
Our test box A is a Pentium 4 2.53GHz system, with 1GB of RAM running Gentoo Linux (all code is optimized for the CPU architecture) and glibc 2.3, which has an improved memory management algorithm over glibc 2.2. Test box B is a K6-II 400MHz laptop with 128MB of RAM, running Slackware Linux with glibc 2.2.
Our first test consisted of allocating a list of 150,000 nodes, 16 bytes each, 30 times. On each loop, the allocated list was destroyed, that is, free()ed in case of traditional management, unlinked in case of GC. The test commands were:
gctest -tu -s 4 -n 150000 -l 30
gctest -gu -s 4 -n 150000 -l 30
The overall execution time, on box A, was 3.80 seconds with traditional management and 2.43 seconds with GC, an improvement of about 35%. The same test performed on box B showed an even greater improvement, around 40%. This first test shows that, contrary to popular belief, GC actually can be quite faster than malloc/free. Heap expansion is rather limited and amounts to about 2 in both cases. What is even more surprising is that allocation latency is the same—6.7 microseconds—with a slightly larger deviation for the GC case. Also interesting is that by calling GC_gcollect() at each loop (option -G), the overall execution time decreases by 0.1 second. This result is counterintuitive, because we have one more function call in the loop.
Now, let's see what happens if we forget to destroy the list at the end of each loop. In the traditional management case, the test executes faster, 2.58 vs. 3.80 seconds, but the peak heap expansion is 140MB, which is twice the overall allocated memory. In the GC case, the test aborts due to memory exhaustion unless we call for an explicit collection (-G) at the end of each loop. By doing so, we obtain the lowest execution time, 2.32 seconds. This probably is quite far from what we could have imagined a priori—that's why actual experimentation is important for finding the optimal solutions.
The same test also has been performed on box B, but with an unoptimized Slackware distribution and glibc 2.2. It is interesting that although the improvement of GC over malloc/free still was around 40%, the test ran 27% faster under Gentoo.
The second test we made shows some limitations of the GC library. The test conditions actually were quite extreme: we tried to allocate five lists, each having 1,500,000 nodes, with each node being 16 bytes long. Although the malloc/free version ran correctly, the GC version did not complete the test because of memory exhaustion. The problem probably is due to the large number of blocks consecutively allocated.
The third test used larger nodes, 140 bytes each, and a shorter list length, 150,000 nodes. We ran:
gctest -tu -s 128 -n 150000 -l 5
|Updates from LinuxCon and ContainerCon, Toronto, August 2016||Aug 23, 2016|
|NVMe over Fabrics Support Coming to the Linux 4.8 Kernel||Aug 22, 2016|
|What I Wish I’d Known When I Was an Embedded Linux Newbie||Aug 18, 2016|
|Pandas||Aug 17, 2016|
|Juniper Systems' Geode||Aug 16, 2016|
|Analyzing Data||Aug 15, 2016|
- Updates from LinuxCon and ContainerCon, Toronto, August 2016
- What I Wish I’d Known When I Was an Embedded Linux Newbie
- NVMe over Fabrics Support Coming to the Linux 4.8 Kernel
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- New Version of GParted
- All about printf
- Tor 0.2.8.6 Is Released
- Returning Values from Bash Functions
- Better Cloud Storage with ownCloud 9.1
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide