Every programmer has the potential to be a great programmer. Equally, every programmer has the potential to include catastrophic errors, particularly when working under the pressure of internet time. Unfortunately, as we all know, code does not exist, even today, solely in internet time. It must be maintained and improved for months or years after its original creation.
This is particularly true for open-source projects if no one individual has assumed maintenance responsibility for the code. Errors can fester for years in legacy code before becoming apparent simply because no one documented a user behavior assumption or checked memory pointers. When we inspect legacy code for a customer, they are often horrified at the number of defects, including defects they would regard as stop-ship, still present in their applications.
The problems of coding errors are particularly acute in the embedded software universe. The target environment is pretty hostile for testers, with limited target resources (memory, I/O) and the fact that code is often in read-only memory. The choice of debugging aids is limited. Not all operating systems are supported, and there's frequently an unsupportable memory overhead that impacts real-time behavior testing. Nor does the multitude of toolchains help with different chips for different projects and development environments with inconsistent tool support.
The goal of this article is to analyze briefly seven important causes of fatal errors in C and C++ programs, along with tips on avoiding them. These errors can be directly tied to a major percentage of downtime in C and C++ production systems. Research tells us that over 70% of the effort spent on most C and C++ applications is spent in maintenance. We all have much better things to do than maintain broken code. If every programmer reading this article avoids just one of these fatal errors in his or her future code, we could be looking at a significant improvement in the defect density of C and C++ applications.
The seven causes of fatal errors in C and C++ covered in this article are: memory leaks, NULL pointer dereferences, bad deallocations, out-of-bounds array accesses, uninitialized variables, forgetting copy constructors and assignment operators in C++ and order of initialization in C++.
A memory leak is a loss of available memory space. This occurs when dynamic data are no longer needed in the program, but the memory that is used has not been deallocated. It can also occur when an assignment is made to a pointer variable that already holds the address of an allocated area.
Each time a memory leak occurs, the application drains the available memory pool. Even in virtual-memory systems, the gradual increase in application size can result in performance degradation that affects the entire system. Eventually, this performance degradation can lead to a fatal out-of-memory condition that may be encountered by an application unrelated to the one that caused the memory leak in the first place.
Tip: make sure you match allocation calls with deallocation calls when doing unit testing, particularly where there might be an early return from a function where dynamic data was allocated. It's also essential to check, before assigning to a pointer, that the memory accessible via that pointer is freed prior to the assignment or made accessible some other way.
A NULL pointer refers to a specific, invalid memory address. A NULL pointer dereference follows a NULL pointer to that memory location and attempts to access data at an invalid address. A NULL pointer dereference will usually cause a program exception; on systems without memory management hardware, it may not be detected at all. This can cause unpredictable results that are very difficult to trace.
Tip: assume the most unlikely event will occur in all situations and add appropriate error-checking and recovery code; this will usually be in the form of if/else statements around the area of the code that accesses the suspect variables. If you have the time, document every assumption you make right in the code, so those maintaining the code in the future will have a shorter learning curve.
Bad deallocation refers to the use of an inappropriate memory release operation or to the deallocation of memory that was never specifically allocated. The impact of this error type will depend on the compiler and specific operation and can range from no effect at all to unexpected results, a memory leak, memory corruption or a program exception.
Tip: the key to avoiding bad deallocations is to match allocations and deallocations carefully and make sure the deallocation is appropriate for the memory class of the structure that is deallocated. A secondary technique is to check carefully the checking of memory usage during unit testing.
An out-of-bounds array access refers to an error that occurs when an array index expression is not within the upper and lower bounds of the array. This type of error can cause data corruption or lead to a program exception.
Tip: because they are frequently caused by unfounded assumptions, again, be sure to document your assumptions--state explicitly what the values of an index expression can be on entering a loop and on leaving it. Be sure of the possible values of an index expression before using it both inside and outside of a loop, and document these assumptions.
An uninitialized variable is one that has not been explicitly initialized before use. Using uninitialized variables can cause unpredictable results in an application and, in the worst case, a program exception.
Tip: the best way to avoid this type of error is to include an initialization statement automatically right along with the declaration of each new local variable.
If copy constructors and assignment operators are not included in classes where dynamic data is used, C++ will call its own default operations. Not including specific copy constructors and assignment operators in classes with dynamic data can lead to memory leaks and, in the worst case, a program exception because data elements can become undefined as scopes change.
Tip: if you have any pointers in your class, be sure to write your own versions of the copy constructor and assignment operator. Inside those methods, you can copy the pointed-to data structures so that every object has its own copy.
Class members are initialized in the order of their declaration in the class. The order in which they are listed in a memory initialization is not relevant. If you assume that initialization is in an order other than that represented by the declaration order in the class, unpredictable results can occur.
Tip: make sure to apply the same order in the constructor as in the declaration list.
Reasoning also advises programmers to avoid the following seven techniques, which have been shown over the years to contribute to the ranks of error-prone and hard-to-understand code:
Don't use tricks.
Don't use globals or statistics unless it's absolutely necessary.
Don't use magic numbers.
Don't overly nest control structures.
Don't put anything unnecessary inside a loop.
Don't use unstructured constructs.
Don't do copy-and-paste programming.
Of course, the seven classes of defects documented above are by no means the only errors developers make that can impact the operation of software in the field. But a common thread runs through the above examples that applies to almost every error. Even when you're under tremendous pressure to meet a completion deadline, make sure you take the time to read over your code and document your assumptions. Remember the Mars probe that went missing? Intel's Pentium chip problems? Or the more recent Nasdaq and NYSE exchange shutdowns? All were caused by software errors. You could be saving your company a lot more than money.
Jasper Kamperman, PhD, is a leading developer of the technology powering Reasoning's InstantQA(sm) service. He has a master's degree in Physics from the University of Utrecht and a PhD in Computer Science from the University of Amsterdam.