Implementing a deltree Command in Linux

 in
Removing a software package is made easy by using Dr. Ekdahl's deltree command.

Ever needed to excise a large software package from your file space only to discover it dispersed over a directory tree containing over a hundred files and one tenth as many subdirectories? The command rm -rf will clear everything away nicely. However, in order to learn more about walking a Linux directory tree, let's look at implementing rm -r as a home-brew (DOS-like) deltree command in Linux is not difficult and will make it easier to remove unused or unwanted software packages if you own the utility and use it in your own file space.

Linux C Library Resources: Files and Directories

To select C library resources to complete this task, we first determine which resources are generally available in UNIX and then which of these resources Linux implements. Two UNIX functions contained in the header file ftw.h walk a directory tree:

int ftw ( const char * path,
        int (*funcptr )
        ( const char *, const struct stat*, int ),
        int depth )

and

int nftw (const char * path,
        int (*funcptr)
        (const char *, const struct stat*, int, struct FTW*),
        int depth,
        int flag )
Linux does not implement the second function, so we turn to the first; ftw walks a directory tree from top to bottom. For each directory entry, ftw calls the function pointed to by funcptr with the name of the entry, a pointer to a stat structure containing inode information and a flag set to convey information about the directory entry in question:
  • FTW_F: a file

  • FTW_D: a directory

  • FTW_DNR: a non-readable directory

  • FTW_NS: stat failed and inode information is not available.

What does ftw return to the caller? If it completes a successful walk through the tree, it returns 0. Otherwise, it returns -1 and sets the global error flag errno appropriately. Using ftw, it is a simple matter to create a function designed to delete directory items and to pass a pointer to that function to ftw (see Listing 1):

In DelEntry(), the C library function

int remove ( const char * path )

in stdio.h does the actual work. This function returns 0 if successful, -1 if unsuccessful and sets the global variable errno as necessary to handle a number of different error conditions, which the Linux man pages explain in detail.

There is, however, a catch. In UNIX, remove may generally be used to delete either files or empty directories. In Linux, remove only processes files and would therefore empty the directory tree but leave the tree itself standing. Two UNIX C library functions may provide solutions:

int rmdirp ( char * d, char *d1 )
int rmdir ( const char * path )

Linux does not implement the first UNIX function rmdirp, so we focus on the second. The function rmdir removes only empty directories and returns 0 on success, -1 otherwise with errno set. To accomplish our task, we must walk the tree twice: once from the top down to the directory at the bottom, deleting files as we go, and the second time from the bottom back to the top, removing empty directories in reverse order. The perfect tool to achieve this result is a container class: a stack of pointers to directory path names.

StrStack: A Stack of Pointers to char Arrays

When ftw calls DelEntry, it supplies a flag indicating whether it found a file, a directory or cannot cope, and we can use this flag to fill StrStack with path names inside DelEntry as ftw walks the tree. The question is where to put the stack. The header file ftw.h specifies the signatures of the ftw function and the function pointed to by funcptr, and neither signature includes a stack, so we cannot pass the stack in by reference as a parameter to DelEntry. The simplest solution is to create StrStack as an external variable in the implementation file funcs.cpp which holds the function definitions for main. As an external variable, StrStack will be equally accessible to DelEntry and to DelDirectories, provided it is defined in the implementation file above these functions.

Several aspects of StrStack require explanation. StrStack differs from the average stack in that each node contains pointers to two different, dynamically allocated structures: a pointer to the next node and a pointer to character strings of varying lengths. Two allocations are necessary to create a node, and two separate deallocations are necessary to destroy a node. By making StrStack responsible for both allocations, the code is more reliable, more robust and has no memory leaks. In addition, if the caller was responsible for allocating and deallocating memory containing character arrays, then exocode could literally pull data out from under StrStack, leaving dangling pointers.

In some places, implicit recursion accomplishes allocation and deallocation, and it may not be obvious at first glance how the process works. Let's examine the copy constructor for StrStack shown in Listing 2.

The class copy constructor is designed to create a copy of a StrStack object in case a function ever passes a stack in or out by value. The code in the function is obvious except for one line:

next_ = new strNode ( *srcnode.next_ );
                // indirect recursion

This line is an example of indirect recursion, and it duplicates all the nodes in the StrStack node sequence. How does it work? The argument to new is: strNode ( *srcnode.next_ ) which is another call to the node copy constructor with the next node in sequence as argument. As long as each node contains a pointer to another node, the copy constructor repeatedly calls itself recursively until it encounters a NULL in the next_ field of the last node in the sequence. With that, the recursion ceases and begins to unwind, constructing a copy backwards from the tail of the node sequence to the head. Note that the copy constructor deals, as promised, with two different dynamic allocations: allocating memory for the node, and then for the character array which holds the path name. In the node destructor, the line delete next_ again triggers a sequence of implicit recursions which result in the destructor calling itself until the final NULL at the end of the list is encountered. At that point, the recursion unwinds, and nodes are deleted from the tail of the list back to the head.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix