Linux Programming Hints

In this column, I'll explore the GNU C Library. The Free Software Foundation (FSF) has written an excellent reference manual, available in an electronic form that can be printed or read on-line, but I think that an introduction will help some people get started.
Error Reporting

argv is often checked within main() to find out what name was used to invoke the program. However, for error reporting mechanisms to work, a variable pointing to argv[0] has to either be global within at least some part of your program or be passed around a lot from function to function and used as an argument to your error handling functions—both of which can get rather messy.

The GNU C library provides two variables, which are automatically initialized before main() is called, which solve this problem. char *program_invocation_name contains an exact copy of the name found in argv[0], and char *program_invocation_short_name contains a copy with all the leading directory names stripped off. So if program_invocation_name contains /usr/bin/foo, program_invocation_short_name contains foo.

With these two variables, error handling functions become a lot simpler and more generic. It is possible to make clean error handling functions without these pre-provided variables, but it requires that you initialize your error handling functions, probably from main(), during program initialization. If you assume that the GNU C library is available, you can simply access these variables directly, cutting down on the possibility of programmer error.

Memory Allocation

The GNU C library contains built-in heap consistency checking, meaning that it can check to see if a program has violated some of the rules for accessing dynamically allocated memory. By calling the mcheck() function before any memory allocation functions are called, you can ask that some consistency checks be occasionally made and an error function be called if there are any inconsistencies.

You can also define functions that are called directly before malloc(), realloc(), and free() are called, to check for errors. mcheck() is implemented by using these hooks, but it is still possible for you to use the hooks even if you are using mcheck() because the functions are “chained”—you just need to follow the rules and the example given in the reference manual to get this to work correctly.

An mstats() function is provided, which gets memory allocation statistics including:

  • The total number of bytes being managed by malloc() (etc.), including memory that has been allocated from the operating system but not allocated to your program by malloc().

  • The number of bytes actually allocated to your program.

  • The number of “chunks” that have been allocated from the operating system, but which are not in use.

  • The number of “chunks” that are actually in use.

  • The number of free bytes which have been allocated by malloc() from the operating system, but which are not currently allocated to your program.

A dynamic stack allocation facility called obstacks is available, and this can be more efficient for some things than malloc. Obstacks have some limitations, but they are implemented as macros and are very quick for small, repeated allocations. They also have a lower space overhead for each small block than malloc() does.

Obstacks are built on malloc() in much the same way that malloc() is built on the system call brk().

A relocating allocator is also provided. This is a memory allocator which provides blocks of memory which may be moved around at any time behind the scenes, and which are therefore referenced through a “handle” which is updated whenever the memory is moved.

It can be a little more work to program with relocating memory because you have to work with, for example, a char ** instead of a char *, but if your program regularly allocates and de-allocates memory in a more-or-less random way, the relocating allocator can provide significant memory savings.

Input and Output

Because there are no really good functions in the standard C library for reading lines, the GNU C library provides some extra functions which are not completely compatible but which work much better. getline() can safely read a string as long as memory can hold. getdlim() is a generalized version of getline(), which gets text until some delimiting character is reached again, without arbitrary limits on how long the line can be. In these functions memory is allocated from within the function, instead of the function requiring you to pass it memory. You are required to free this memory when you are done with it.

Safe formatted string I/O is provided by snprintf(), asprintf(), and obstack_printf() the first of these is a version of sprintf() which knows how long a string it has to write into; and the other two dynamically allocate whatever space they need, like getline() and getdlim().

The GNU C library provides functions for customizing printf(). You can define a %q format for the standard printf(), for example, and make it do whatever you want. If you would like to be able to easily print out structures in your application, simply make printf() conversions for them, and pass pointers to structures into printf(). If %q is your generic structure-printing conversion, and struct foo has been designated as structure number 1, you could make it possible to write: printf("%1q\n", &foo); and have the contents of foo printed out for you.

scanf() is compatibly extended so that it can optionally allocate string storage itself, so, for instance, you don't have to have a maximum string size.

It is also possible to do standard I/O on memory, using functions like fmemopen() to get a FILE * which references memory instead of a file. Now all your standard I/O functions can be used to write into memory. It is even possible to define your own types of streams, so you could, for example, write a set of procedures which allow you to use fprintf() to “print” to something via SYSV IPC messages.