Implementing a deltree Command in Linux
Ever needed to excise a large software package from your file space only to discover it dispersed over a directory tree containing over a hundred files and one tenth as many subdirectories? The command rm -rf will clear everything away nicely. However, in order to learn more about walking a Linux directory tree, let's look at implementing rm -r as a home-brew (DOS-like) deltree command in Linux is not difficult and will make it easier to remove unused or unwanted software packages if you own the utility and use it in your own file space.
To select C library resources to complete this task, we first determine which resources are generally available in UNIX and then which of these resources Linux implements. Two UNIX functions contained in the header file ftw.h walk a directory tree:
int ftw ( const char * path, int (*funcptr ) ( const char *, const struct stat*, int ), int depth )
int nftw (const char * path, int (*funcptr) (const char *, const struct stat*, int, struct FTW*), int depth, int flag )Linux does not implement the second function, so we turn to the first; ftw walks a directory tree from top to bottom. For each directory entry, ftw calls the function pointed to by funcptr with the name of the entry, a pointer to a stat structure containing inode information and a flag set to convey information about the directory entry in question:
FTW_F: a file
FTW_D: a directory
FTW_DNR: a non-readable directory
FTW_NS: stat failed and inode information is not available.
In DelEntry(), the C library function
int remove ( const char * path )
in stdio.h does the actual work. This function returns 0 if successful, -1 if unsuccessful and sets the global variable errno as necessary to handle a number of different error conditions, which the Linux man pages explain in detail.
There is, however, a catch. In UNIX, remove may generally be used to delete either files or empty directories. In Linux, remove only processes files and would therefore empty the directory tree but leave the tree itself standing. Two UNIX C library functions may provide solutions:
int rmdirp ( char * d, char *d1 ) int rmdir ( const char * path )
Linux does not implement the first UNIX function rmdirp, so we focus on the second. The function rmdir removes only empty directories and returns 0 on success, -1 otherwise with errno set. To accomplish our task, we must walk the tree twice: once from the top down to the directory at the bottom, deleting files as we go, and the second time from the bottom back to the top, removing empty directories in reverse order. The perfect tool to achieve this result is a container class: a stack of pointers to directory path names.
When ftw calls DelEntry, it supplies a flag indicating whether it found a file, a directory or cannot cope, and we can use this flag to fill StrStack with path names inside DelEntry as ftw walks the tree. The question is where to put the stack. The header file ftw.h specifies the signatures of the ftw function and the function pointed to by funcptr, and neither signature includes a stack, so we cannot pass the stack in by reference as a parameter to DelEntry. The simplest solution is to create StrStack as an external variable in the implementation file funcs.cpp which holds the function definitions for main. As an external variable, StrStack will be equally accessible to DelEntry and to DelDirectories, provided it is defined in the implementation file above these functions.
Several aspects of StrStack require explanation. StrStack differs from the average stack in that each node contains pointers to two different, dynamically allocated structures: a pointer to the next node and a pointer to character strings of varying lengths. Two allocations are necessary to create a node, and two separate deallocations are necessary to destroy a node. By making StrStack responsible for both allocations, the code is more reliable, more robust and has no memory leaks. In addition, if the caller was responsible for allocating and deallocating memory containing character arrays, then exocode could literally pull data out from under StrStack, leaving dangling pointers.
In some places, implicit recursion accomplishes allocation and deallocation, and it may not be obvious at first glance how the process works. Let's examine the copy constructor for StrStack shown in Listing 2.
The class copy constructor is designed to create a copy of a StrStack object in case a function ever passes a stack in or out by value. The code in the function is obvious except for one line:
next_ = new strNode ( *srcnode.next_ ); // indirect recursion
This line is an example of indirect recursion, and it duplicates all the nodes in the StrStack node sequence. How does it work? The argument to new is: strNode ( *srcnode.next_ ) which is another call to the node copy constructor with the next node in sequence as argument. As long as each node contains a pointer to another node, the copy constructor repeatedly calls itself recursively until it encounters a NULL in the next_ field of the last node in the sequence. With that, the recursion ceases and begins to unwind, constructing a copy backwards from the tail of the node sequence to the head. Note that the copy constructor deals, as promised, with two different dynamic allocations: allocating memory for the node, and then for the character array which holds the path name. In the node destructor, the line delete next_ again triggers a sequence of implicit recursions which result in the destructor calling itself until the final NULL at the end of the list is encountered. At that point, the recursion unwinds, and nodes are deleted from the tail of the list back to the head.