Finding Files and More

All about the find command.
More than printing!

Now that you know how to locate just about any file, what can you do with them besides print their names?

$ find . -fprint foo

sends a list of the files in the current directory to a file “foo”. If the file does not exist it is created. If it does, its contents are replace.

Find also offers the -printf action. This allows output to be formatted.

$ find . -printf 'Name: %f Owner: %u %s bytes\n'

produces a table of files with their name, owner, and size in bytes.

The -printf action has many predefined fields that cover all of the information available for a file. See Table 2 for an incomplete list of options. Find also has a -fprintf switch which will send the output to a file, like -fprint.

Table 2. printf Options

Escape Sequences\a - Alarm Bell\b - Backspace\f - Form Feed\n - Newline (not provided automatically)\c - Carriage return- Horizontal tab\v - Vertical tab\\ - A literal backslash\c - Stop printing and flush output

Formatting Sequences%b - File size in 512 byte blocks%k - File size in 1k blocks%s - File size in bytes%a - Access time in standard format%A - Formatted access time (see man page for options)%c - Status time in standard format%C - Formatted status time (same a %A)%F - Type of filesystem%p - File name%f - File name with path removed%P - File name with find argument removed (file instead of ./file)%u - User name%g - Group Name

(See the man page for complete listing)

A third option for output is -ls. This option produces a listing of files that is the equivalent of the output from ls -idls. The -fls option will send this to a file.

Of course, simply producing formatted lists of files is not the limit to find's usefulness. Find also allows us to execute commands on them with -exec and -ok. -exec executes a command for each file that matches.

Our earlier example demonstrates a common use for the -exec option: deleting old and unused files.

$ find /tmp/* -atime +10 -exec rm -f {} \;

After the -exec switch itself, we specify the command, any options (such as the -f), and then {}, which represents the matched files. The command line must then be terminated with ; (the \ is to prevent shell expansion).

$ find . -type f -exec grep -l linux {} \;

would execute the command grep -l linux on all regular files in and under the current directory.

The -ok switch operates the same way, but will prompt the user for confirmation before executing the command on each file.

$ find . -ok tar rvf backup {} \;

This command will descend through the current directory and below, asking the user which files should be added to the tar archive “backup”.

This leads us into some practical uses for find.

Sometimes it's necessary to duplicate a directory or directory structure. For this purpose many users utilize the cp command with the -r option. However, this command does not always create an exact copy!

Create a directory with a file and a link in it.

$ mkdir test
$ touch test/bar
$ ln -s /vmlinuz /test/foo
$ ls -l test
-rwx--x--x eric staff 0 Sep  9 bar
lrwxrwxrwx eric staff 8 Sep  9 foo -> /vmlinuz

Now copy it with cp -r

$ cp -r test test1
$ ls -l test1
-rwx--x--x eric staff      0 Sep  9 11:18 bar
-rw-rw-r-- eric staff 318436 Sep  9 11:18 foo

The cp command followed the soft link and copied the kernel into the new directory!

Let's try a different approach:

$ rm -r test1
$ cd test
$ find -depth -print | cpio -pdmv ../test1
$ ls -l ../test1
-rwx--x--x eric staff 0 Sep  9 bar
lrwxrwxrwx eric staff 8 Sep  9 foo -> /vmlinuz

This method uses cpio to copy files to the new directory. Find produces the file list by descending the directory structure. Even though our example was only one directory deep, we know that find can descend an entire directory structure. We also know that we can also control which directories it descends and which files it outputs.

In the above command I added the -depth option. It insures that directory names are output before the files in them. This allows cpio to create the directories before trying to copy files into them.

The cpio command is another multipurpose tool in the Unix toolbox. It can create archives in a variety of formats and also extract from them. It also handles the output of find's -print option perfectly. Combined, these tools could form a simple backup system. (Please note: I am presenting this purely as an example. Systems that support many users or that have irreplaceable data on them should use more extensive and robust backup systems.)

$ find . -depth -print \
  | cpio -ov --format=crc > /dev/fd0

find reads the contents of the current directory, and the filenames are piped to cpio, which copies the files to the floppy in the System V R4 archive format with CRC checksums. (This format is preferred to the default since it is platform independent, supports larger hard disks, and provides at least simple error checking.)

When cpio reaches the end of each floppy it prompts us with:

Found end of tape.  To continue, type device/file
name when ready.

In order to continue, type:

/dev/fd0 RETURN

Of course, if you are lucky enough to have a tape drive or other storage system, you may not have to do this, though cpio can also span tapes if the archive does not fit on one.

This system does have at least one drawback: if the data to be stored will not fit on one unit, the backup cannot be fully automated.

The first backup of my home directory spanned ten floppies. I reviewed the contents and noticed two subdirectories that probably were not worth backing up, so I altered find's arguments:

$ find . \
  \( -path ./.netscape-cache -o -path ./lg \)\
  -prune -o -print | \
  cpio -ov --format=crc > /dev/fd0

This introduces some more find options. The \( and the \) are parentheses with \ to prevent shell expansion. Find allows parentheses to logically group expressions. This was necessary since I have two expressions in the command

\( -path ./.netscape-cache -o -path ./lg \)

Inside the parentheses we have two -path statements separated by -o. This is a find “or” statement.

\( -path ./.netscape-cache -o -path ./lg \) -prune

Find's -prune option causes find to not enter a directory. Therefore, we can translate the above to “If the path is ./.netscape-cache or ./lg do not descend into the directory.”

After this clause we see another -o statement. If the file does not meet the criteria for pruning, it is printed instead.

So, my entire home directory with the exception of my Netscape cache and lg directory is now backed up.

This is fine for an initial backup. But what about next week when I want to backup my directory, but I've only really touched a few files?

$ find . \
  \( -path ./.netscape-cache -o -path ./lg \) \
  -prune -o \( -mtime -7 \) -print | \
  cpio -ov --format=crc > /dev/fd0

This adds one more clause: “If the file is not under the netscape cache or the lg directory, check if it has been modified in the past 7 days. If it has, then print the name.” The name is then sent to cpio to archive.

Obviously these command lines can get very complicated. It's usually a good idea to test them by piping the output through more before using cpio.

In addition to -o find also has an “and” operator, -and, and a negation operator -not. When multiple match criteria are specified, -and is implied.

$ find -mtime -5 -type f -print

prints files that have been modified during the last five days and are regular files.

$ find -mtime -5 -not -type f -print

prints things that have been modified during the last five days that are not regular files: directories, soft links, etc.

But wait, disaster has struck! Your (sister, son, daughter, little brother, mom, spouse, whoever) has deleted a very important file! Time to use that backup.

$ cpio -t < /dev/fd0

produces a table of contents from the archive. As it does during backup operations, cpio prompts for the next disk while it reads the table of contents.

$ cpio -i core < /dev/fd0

The -i switch tells cpio to extract the named file. The absence of a file name cause cpio to restore the entire archive.

System maintenance tasks can also be simplified with find. Our second example demonstrated using find to clean out older files.

$ find /home -name core -o -name foo \
  -exec rm -f {} \; 2> /dev/null

This command cleans out any core dumps or files named “foo” from home directories. (Although some files named “foo” can be very important!)

$ find /var/adm/messages -size +32k \
  -exec Mail -s "{}" root < /var/adm/messages \;
  -exec cp /dev/null {} \;

This is another example from the crontab on my Caldera/Red Hat system. It uses the implicit “and” function to mail the system messages file to root and then empty it.

Find also has an important security application. Two of the file modes that I did not cover earlier are SUID and SGID. These modes provide a user with the rights of the owner or group of a program when the program is executed.

An example of this is the passwd program. This program allows users to change their password. In order to do this the /etc/passwd (or /etc/shadow) file must be modified, which is a function only root should be able to perform. Since the passwd program belongs to root and has the SUID mode set, it can modify the necessary file. When passwd completes the user's rights return to normal. The passwd program is responsible for making sure the user can't do anything wrong while acting as root.

$ ls -l /usr/bin/npasswd
-r-s--x--x 1 root /usr/bin/npasswd

(/usr/bin/passwd is linked to /usr/bin/npasswd on my system.) The s in the execute field for owner signifies SUID. A SGID program would have s in the execute field for group.

This mechanism has obvious security implications. A user (or invader) who has compromised a system could install a program (such as a shell) with this mode set and then do whatever they wish whenever they want by running that program.

In octal notation SUID is expressed as 4000 and SGID is 2000, so

$ find / -perm 4000 -print

produces a list of SUID files on a system.

$ find / -type f \( -perm 2000 -o -perm 4000 \) \

produces a list of regular files that have SGID or SUID mode set.

This list could be saved to a file (with -fprint) and compared each day with the output from the previous day.

This article does not cover every option for find. This was also only a cursory explanation of filesystems and access modes. Hopefully, I was able to provide you with enough information to make using Linux a little easier and a lot more rewarding.


White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState