Implementing a deltree Command in Linux

by Graydon L. Ekdahl,

Ever needed to excise a large software package from your file space only to discover it dispersed over a directory tree containing over a hundred files and one tenth as many subdirectories? The command rm -rf will clear everything away nicely. However, in order to learn more about walking a Linux directory tree, let's look at implementing rm -r as a home-brew (DOS-like) deltree command in Linux is not difficult and will make it easier to remove unused or unwanted software packages if you own the utility and use it in your own file space.

Linux C Library Resources: Files and Directories

To select C library resources to complete this task, we first determine which resources are generally available in UNIX and then which of these resources Linux implements. Two UNIX functions contained in the header file ftw.h walk a directory tree:

int ftw ( const char * path,
        int (*funcptr )
        ( const char *, const struct stat*, int ),
        int depth )

and

int nftw (const char * path,
        int (*funcptr)
        (const char *, const struct stat*, int, struct FTW*),
        int depth,
        int flag )
Linux does not implement the second function, so we turn to the first; ftw walks a directory tree from top to bottom. For each directory entry, ftw calls the function pointed to by funcptr with the name of the entry, a pointer to a stat structure containing inode information and a flag set to convey information about the directory entry in question:
  • FTW_F: a file

  • FTW_D: a directory

  • FTW_DNR: a non-readable directory

  • FTW_NS: stat failed and inode information is not available.

What does ftw return to the caller? If it completes a successful walk through the tree, it returns 0. Otherwise, it returns -1 and sets the global error flag errno appropriately. Using ftw, it is a simple matter to create a function designed to delete directory items and to pass a pointer to that function to ftw (see Listing 1):

In DelEntry(), the C library function

int remove ( const char * path )

in stdio.h does the actual work. This function returns 0 if successful, -1 if unsuccessful and sets the global variable errno as necessary to handle a number of different error conditions, which the Linux man pages explain in detail.

There is, however, a catch. In UNIX, remove may generally be used to delete either files or empty directories. In Linux, remove only processes files and would therefore empty the directory tree but leave the tree itself standing. Two UNIX C library functions may provide solutions:

int rmdirp ( char * d, char *d1 )
int rmdir ( const char * path )

Linux does not implement the first UNIX function rmdirp, so we focus on the second. The function rmdir removes only empty directories and returns 0 on success, -1 otherwise with errno set. To accomplish our task, we must walk the tree twice: once from the top down to the directory at the bottom, deleting files as we go, and the second time from the bottom back to the top, removing empty directories in reverse order. The perfect tool to achieve this result is a container class: a stack of pointers to directory path names.

StrStack: A Stack of Pointers to char Arrays

When ftw calls DelEntry, it supplies a flag indicating whether it found a file, a directory or cannot cope, and we can use this flag to fill StrStack with path names inside DelEntry as ftw walks the tree. The question is where to put the stack. The header file ftw.h specifies the signatures of the ftw function and the function pointed to by funcptr, and neither signature includes a stack, so we cannot pass the stack in by reference as a parameter to DelEntry. The simplest solution is to create StrStack as an external variable in the implementation file funcs.cpp which holds the function definitions for main. As an external variable, StrStack will be equally accessible to DelEntry and to DelDirectories, provided it is defined in the implementation file above these functions.

Several aspects of StrStack require explanation. StrStack differs from the average stack in that each node contains pointers to two different, dynamically allocated structures: a pointer to the next node and a pointer to character strings of varying lengths. Two allocations are necessary to create a node, and two separate deallocations are necessary to destroy a node. By making StrStack responsible for both allocations, the code is more reliable, more robust and has no memory leaks. In addition, if the caller was responsible for allocating and deallocating memory containing character arrays, then exocode could literally pull data out from under StrStack, leaving dangling pointers.

In some places, implicit recursion accomplishes allocation and deallocation, and it may not be obvious at first glance how the process works. Let's examine the copy constructor for StrStack shown in Listing 2.

The class copy constructor is designed to create a copy of a StrStack object in case a function ever passes a stack in or out by value. The code in the function is obvious except for one line:

next_ = new strNode ( *srcnode.next_ );
                // indirect recursion

This line is an example of indirect recursion, and it duplicates all the nodes in the StrStack node sequence. How does it work? The argument to new is: strNode ( *srcnode.next_ ) which is another call to the node copy constructor with the next node in sequence as argument. As long as each node contains a pointer to another node, the copy constructor repeatedly calls itself recursively until it encounters a NULL in the next_ field of the last node in the sequence. With that, the recursion ceases and begins to unwind, constructing a copy backwards from the tail of the node sequence to the head. Note that the copy constructor deals, as promised, with two different dynamic allocations: allocating memory for the node, and then for the character array which holds the path name. In the node destructor, the line delete next_ again triggers a sequence of implicit recursions which result in the destructor calling itself until the final NULL at the end of the list is encountered. At that point, the recursion unwinds, and nodes are deleted from the tail of the list back to the head.

More Utilities

If C library calls do most of the grunt work, writing a simple utility like deltree is neither difficult nor time consuming and may offer unanticipated opportunities to use the framework of the program to address additional problems. This code can be adapted to perform the same functions as find; use it as a skeleton for a findfile command which scans a directory tree for file names matching a command-line argument. Just replace the file and directory deletion subroutines with functions that compare each file name with the target name and print out the path name to each match.

Do you port code from the Borland or Watcom PC DOS or OS/2 environments and move entire directory trees into Linux space at once? If so, then you have probably discovered that unwanted files migrate along with the necessary ones: files with suffixes such as .map, .sym, .dsk, .swp, .prj, .exe, and the like. With modification, deltree can also provide a framework for a cleandir utility that removes the chaff from the toolbox directory tree. To make the necessary changes, replace StrStack with a StrList class which contains a list of target file name suffixes. Instead of removing all files, the utility checks the suffix of each file in the directory tree against the list of target suffixes and deletes selectively. Once you have the hang of walking a Linux directory tree, creating plug-in functions to perform other tasks is a simple matter, and it is easy to generate a group of utilities which address a broader spectrum of directory tree maintenance issues.

All code needed to implement this command is available by anonymous download in the file ftp://ftp.linuxjournal.com/pub/lj/listings/issue52/2439.tgz.

Graydon Ekdahl (gekdahl@ibm.net) s president of Econometrics, Inc. located in Chapel Hill, N.C. Graydon enjoys creating database applications and is interested in data structures, algorithms, C++ and Java.

Load Disqus comments