Verifying Filesystem Integrity with CVS

Even the most conscientious of paranoid penguins needs a vacation, so we have a guest columnist for Mick Bauer this month.
Paranoid Penguin

Verifying Filesystem Integrity with CVS

Even the most conscientious of paranoid penguins needs a vacation sometime, so we have a guest columnist for Mick Bauer this month.

by Michael Rash

The single most effective way to catch host-based intrusions is by detecting changes in the filesystem. Toward this end, Tripwire is the best known piece of software that automates the process of verifying filesystem integrity on a machine over time and alerting the administrator if any unauthorized changes are detected. The types of changes that Tripwire can detect include changes in file permissions and type, inode number, link count, uid and gid, size, access timestamp, modification timestamp, inode change timestamp and one-way hash signature (generated by any of ten different signature functions, including md5 and Snefru). In this article we explore the use of the Concurrent Versions System (CVS), together with some Perl glue, as a filesystem integrity checker in a manner similar to Tripwire. Although Tripwire is the best known filesystem integrity checker, it should be noted that Tripwire recently became a proprietary product to which there are many free and/or open-source alternatives, such as AIDE and FCheck (see Resources).


The basic process of deploying Tripwire on a system can be summed up by the following sequence of steps: 1) install the operating system, and immediately run Tripwire from the console before exposing the machine to any type of network connection; 2) store the resulting Tripwire database of filesystem information on read-only media separate from the system; and 3) periodically run Tripwire against the original database so a comparison can be made between the current filesystem information and the original database in order to look for any anomalies.


CVS is a tool used primarily by software developers to help organize and provide a version structure to a set of source code. One of its most useful features is the ability to keep track of all changes in a piece of source code (or ordinary text) over time, starting from when the code initially was checked into the CVS repository. When a developer wishes to make changes, the code is checked out of the repository, modifications are made and the resulting code is committed back into the repository. CVS automatically increments the version number and keeps track of the changes. After modifications have been committed to the repository, it is possible to see exactly what was changed in the new version relative to the previous (or any other) version by using the cvs diff command.

Now that we have a basic understanding of the steps used to administer Tripwire for a system, we will show how CVS can be leveraged in much the same way, with one important enhancement: difference tracking for plain-text configuration files. If one of your machines is cracked, and a user is added to the /etc/passwd file, Tripwire can report the fact that any number of file attributes have changed, such as the mtime or one-way hash signature, but it cannot tell you exactly which user was added.

A significant problem for intrusion detection in general is that the number of false positives generated tends to be very high, and host-based intrusion-detection systems (HBIDS) are not exempt to this phenomenon. Depending on the number of systems involved, and hence the resulting complexity, false positives can be generated so frequently that the data generated by HBIDS may become overwhelming for administrators to handle. Thus, in the example of a user being added to the /etc/passwd file, if the HBIDS could report exactly which user was added, it would help to determine whether the addition was part of a legitimate system administration action or something else. This could save hours of time since once it has been determined that a system has been compromised, the only way to guarantee returning the system to its normal state is to re-install the entire operating system from scratch.

Collecting the Data

In keeping with the Tripwire methodology of storing the initial filesystem snapshot on media separate from the system being monitored, we need a way to collect data from remote systems and archive it within a local CVS repository. To accomplish this, we set up a dedicated CVS “collector box” within our network. All filesystem monitoring functions will be executed by a single user from this machine. Monitoring functions include the collecting of raw data, synchronizing the data with the CVS repository and sending alerts if unauthorized changes are found. We will utilize the ssh protocol as the network communication vehicle to collect data from the remote machines. To make things easier we put our HBIDS user's RSA key into root's authorized_keys file on each of the target systems. This allows the HBIDS user to execute commands as root on any target system without having to supply a password via ssh. Now that we have a general idea for the architecture of the collector box, we can begin collecting the data.

We need to collect two classes of data from remote systems: ASCII configuration files and the output of commands. Collecting the output of commands is generally a broader category than collecting files because we probably are not going to want to replicate all remote filesystems in their entirety. Commands that produce important monitoring output include md5sum (generates md5 signatures for files) and find / -type f -perm +6000 (finds all files that have the uid and gid bits set in their permissions). In Listings 1 and 2, we illustrate Perl code that collects data from remote systems and monitors changes to this data over time.

Listing 1.

In Listing 1, we have a piece of Perl code that makes use of the Net::SSH::Perl module to collect three sets of data from the host whose address is, md5 hash signatures of a few important system binaries, a tar archive of a few OS configuration files and a listing of all files that have the uid and/or gid permission bits set. Lines 7 and 8 define the IP address of the target machine as well as the remote user that will log in to. Recall that we have the local user's preshared key on the remote machine, so we will not have to supply a password to log in. Lines 10-13 define a small sample list of system binaries for which md5 hash signatures will be calculated, and similarly, lines 15-18 define a list of files that will be archived locally from the remote system. Lines 20-24 build a small hash to link a local filename to the command that will be executed on the remote system, and the output of each command will be stored locally within this file. Lines 27-33 comprise the meat of the collection code and call the run_command() subroutine (lines 36-49) for each command in the %Cmds hash. Each execution of run_command() will create a new Net::SSH::Perl object that will be used to open an ssh session to the remote host, log in and execute the command that was passed to the subroutine.

Listing 2.

In Listing 2, we illustrate a piece of Perl code that is responsible for generating e-mail alerts if any of the files collected by change. This is accomplished first by checking the current module out of the CVS repository (line 12), executing (line 14) to install fresh copies of the remote command output and files within the local directory structure, and then committing each (possibly modified) file back into the repository (line 20). By checking the return value of the cvs commit command for each file (line 20), we can determine if changes have been made to the file, as cvs automatically increments the file's version number and keeps track of exactly what has changed. If a change is detected in a particular file, calculates the previous version number (lines 27-36), executes the cvs diff command against the previous revision to get the changes (lines 39-40) and e-mails the contents of the change (lines 47-48) to the e-mail address defined in line 7.