Linux Programming Hints
In this column, I'll explore the GNU C Library. The Free Software Foundation (FSF) has written an excellent reference manual, available in an electronic form that can be printed or read on-line, but I think that an introduction will help some people get started.
The GNU C Library is more than a re-implementation of the Standard C Library; while it has all the features of the Standard C Library, it has far more interesting and useful features as well. Unfortunately, it is not necessarily a good idea to use all those features in your programs.
One method that the FSF has used to avoid copyright infringement lawsuits from unhappy commercial vendors has been to remove restrictions and arbitrary limits from the GNU versions of programs. For example, where the standard version of a program might be limited to handling lines less than 4096 characters long, the GNU version is likely to handle lines of any length that memory can hold.
They have followed the same philosophy in their version of the C Library: why not make improvements, so long as the library is still compatible? So where most standard C libraries contain a printf() which causes a segmentation violation when something like printf("%s", NULL) is called, the GNU C library prints (null). This is not a feature used to print (null), but a debugging aid which allows the programmer to find and correct buggy code more easily, without having to inspect core files caused by segmentation violations.
While maintaining POSIX compatibility, the FSF has significantly extended the C library, making it far more useful in the process. Unfortunately, when you use these extensions, your program becomes less portable to other platforms. To make a program generally useful, the GNU C library should be ported to any platform where your program might be useful.
On the other hand, writing good software that requires the GNU C library may encourage the further spread of the GNU C library. It may also make your programs work better, since the better the library the program is built on, the better the program may be; and some of the higher-level functions may allow you to write simpler, more maintainable code. You can spend less effort getting around library limitations. Buggy libraries can waste a lot of a programmer's time, as veteran programmers know. Since the GNU C library has a reputation as a good implementation of the Standard C library, with useful extensions, you may be doing all your fellow programmers a favor by encouraging the spread of the GNU C library.
Another reason to encourage the spread of the GNU C Library is the very fact that it is free software. It can be a tremendous help to be able to read library source when you don't understand what a library function call is doing.
The linux C library is based almost completely on the GNU C library and will probably be merged with the GNU C library eventually. This does not imply that writing programs under linux requires or encourages writing non-portable programs. The -ansi switch for GCC enforces fairly strict ANSI compliance(1), and by default masks references to all the GNU extensions from the header files, so that you can be sure your program is completely portable. Section 1.3.4, Feature Test Macros, in the GNU C Library Reference Manual, explains how to choose which features you want included while using the GNU C Library.
If you write programs based on books like W. Richard Stevens'—Advanced Programming in the Unix Environment, Kernighan and Ritchie's—The C Programming Language, Donald Lewine's—POSIX Programmer's Guide and other such standard references, your code should be portable to many operating systems as well as to linux. However, with linux, you have the choice of using GNU-specific library routines, and of promoting the use of the GNU C library on other platforms as well.
For the rest of the column, we will leave such philosophical ramblings behind and assume that you have chosen to use the GNU C library in all its glory, above and beyond the ANSI standard and that you want an introduction to its extensions so that you know what features are there to be used. I will go through the reference manual, pointing out and briefly explaining many of the useful enhancements of the GNU C library. This is not a coherent discussion of the GNU C library, but a list of extensions that people intending to use the GNU C library for serious programming should know about. This way they can decide whether or not to use the features, rather than being condemned by ignorance to ignore them....
If you find these functions worth using, please look them up in the—GNU C Library Reference Manual. Don't try to use them just from my descriptions here - these descriptions are just to catch your interest. Follow the references instead.
argv is often checked within main() to find out what name was used to invoke the program. However, for error reporting mechanisms to work, a variable pointing to argv has to either be global within at least some part of your program or be passed around a lot from function to function and used as an argument to your error handling functions—both of which can get rather messy.
The GNU C library provides two variables, which are automatically initialized before main() is called, which solve this problem. char *program_invocation_name contains an exact copy of the name found in argv, and char *program_invocation_short_name contains a copy with all the leading directory names stripped off. So if program_invocation_name contains /usr/bin/foo, program_invocation_short_name contains foo.
With these two variables, error handling functions become a lot simpler and more generic. It is possible to make clean error handling functions without these pre-provided variables, but it requires that you initialize your error handling functions, probably from main(), during program initialization. If you assume that the GNU C library is available, you can simply access these variables directly, cutting down on the possibility of programmer error.
The GNU C library contains built-in heap consistency checking, meaning that it can check to see if a program has violated some of the rules for accessing dynamically allocated memory. By calling the mcheck() function before any memory allocation functions are called, you can ask that some consistency checks be occasionally made and an error function be called if there are any inconsistencies.
You can also define functions that are called directly before malloc(), realloc(), and free() are called, to check for errors. mcheck() is implemented by using these hooks, but it is still possible for you to use the hooks even if you are using mcheck() because the functions are “chained”—you just need to follow the rules and the example given in the reference manual to get this to work correctly.
An mstats() function is provided, which gets memory allocation statistics including:
The total number of bytes being managed by malloc() (etc.), including memory that has been allocated from the operating system but not allocated to your program by malloc().
The number of bytes actually allocated to your program.
The number of “chunks” that have been allocated from the operating system, but which are not in use.
The number of “chunks” that are actually in use.
The number of free bytes which have been allocated by malloc() from the operating system, but which are not currently allocated to your program.
A dynamic stack allocation facility called obstacks is available, and this can be more efficient for some things than malloc. Obstacks have some limitations, but they are implemented as macros and are very quick for small, repeated allocations. They also have a lower space overhead for each small block than malloc() does.
Obstacks are built on malloc() in much the same way that malloc() is built on the system call brk().
A relocating allocator is also provided. This is a memory allocator which provides blocks of memory which may be moved around at any time behind the scenes, and which are therefore referenced through a “handle” which is updated whenever the memory is moved.
It can be a little more work to program with relocating memory because you have to work with, for example, a char ** instead of a char *, but if your program regularly allocates and de-allocates memory in a more-or-less random way, the relocating allocator can provide significant memory savings.
Because there are no really good functions in the standard C library for reading lines, the GNU C library provides some extra functions which are not completely compatible but which work much better. getline() can safely read a string as long as memory can hold. getdlim() is a generalized version of getline(), which gets text until some delimiting character is reached again, without arbitrary limits on how long the line can be. In these functions memory is allocated from within the function, instead of the function requiring you to pass it memory. You are required to free this memory when you are done with it.
Safe formatted string I/O is provided by snprintf(), asprintf(), and obstack_printf() the first of these is a version of sprintf() which knows how long a string it has to write into; and the other two dynamically allocate whatever space they need, like getline() and getdlim().
The GNU C library provides functions for customizing printf(). You can define a %q format for the standard printf(), for example, and make it do whatever you want. If you would like to be able to easily print out structures in your application, simply make printf() conversions for them, and pass pointers to structures into printf(). If %q is your generic structure-printing conversion, and struct foo has been designated as structure number 1, you could make it possible to write: printf("%1q\n", &foo); and have the contents of foo printed out for you.
scanf() is compatibly extended so that it can optionally allocate string storage itself, so, for instance, you don't have to have a maximum string size.
It is also possible to do standard I/O on memory, using functions like fmemopen() to get a FILE * which references memory instead of a file. Now all your standard I/O functions can be used to write into memory. It is even possible to define your own types of streams, so you could, for example, write a set of procedures which allow you to use fprintf() to “print” to something via SYSV IPC messages.
The GNU C Library Reference Manual is an amazingly large and comprehensive work. While it's not perfect and is still being written, it contains a lot of information. I do not know if it is being published on paper, but it's available via ftp from all gnu mirror sites and can easily be printed or formatted for on-line reading from within emacs or the standalone info reader.
I'll take some space here to plug, as usual, some of the books that I have found most helpful, books which I think that my readers should not be without.
When you are programming for modern variants of Unix, you ought not to be without W. Richard Stevens' Advanced Programming in the Unix Environment, which has most of the information you need to write real applications under most variants of Unix. Both the principles and the details are covered. ISBN: 0-201-56317-7
For learning how to write POSIX compatible programs which can run on more than just Unix platforms (rather the opposite of this month's column, I'll admit), I recommend Donald Lewine's POSIX Programmer's Guide. It's hard to go wrong if you follow this book. ISBN: 0-937175-73-0
I'm open to suggestions on what programming hints people would like to see. Please send email to firstname.lastname@example.org or send paper mail to Programming Tips, Linux Journal, P.O. Box 84867, Seattle, WA 98145-1867, and I'll consider your suggestions. If you have any books which you really like and which you would like to see me recommend in this column, please recommend them to me. I'd appreciate a detailed description of any book which you find indispensable as a Unix programmer.
American National Standards Institute: American National Standard X3.159-1989-“ANSI C”.