Finding Files and More

 in
All about the find command.

Not long after getting their first Linux system, new users usually need to locate a file somewhere on their system. So they learn the following command from a friend, or maybe a book or magazine:

$ find / -name filename -print

Now while this command does work perfectly fine, the syntax does seem awkward to people unfamiliar with the find command. Why should we have to specify print? [Note: On Linux systems, and other systems that use GNU find, we don't. But standard Unix find insists on it, so you might as well get used to it if you use Unix as well as Linux.]

For that matter, why should we have to specify name? Why not just find filename? It's this seemingly cryptic structure that makes find one of the most under used commands in the Unix toolbox.

A look at the find man page (on any system, not just Linux) completes the confusing picture. For someone not familiar with Unix, find's “operators” and “expressions” make it an awfully complicated program just for locating files.

If all you want to do is locate a file, there is a better way to do that:

locate filename

This will work on a properly set-up Linux system with GNU find. Why have a complicated command like find when we already have a simple command like locate? Because find is good for much more than just finding files. (Good Linux distributions some with update properly set up. If yours isn't, you can run updatedb as root to update the database it uses, or simply use find as shown above).

The Caldera/Redhat system that I use at home has several entries in the crontab that run this command:

find /tmp/* -atime +10 -exec rm -f {} \;

This command deletes any files in /tmp that haven't been accessed in the past ten days. The fact that find only deletes files that haven't been accessed in the past ten days rather than files that were created that long ago is a subtle, but very important point. Find gives us access to the very valuable set of information stored about files and directories in Unix filesystems.

Like most Unix filesystems, the second extended filesystem (“ext2”) that is used on most Linux systems stores a more extensive set of data about files than just their name, size and last-change-date the way systems such as DOS do. It also stores an owner and group, access mode, the dates that the file was last modified and accessed, the date that the file last changed status, and the type. (Don't worry, we'll explain these as we go).

With the exception of the names, all this information is stored for each file and directory in a structure called an inode. In Unix filesystems, directories are simply files that contain a list of filenames with inode numbers.

Table 1 has a list of inode entry fields and how they are “translated” for the different filesystem types supported by Linux. While this table may not mean much to you yet, it should be self-explanatory by the time you finish reading this article.

The Command Line

Let's analyze the find command line:

find starting-point options criteria action
  • starting-point One or more directories from which to start searching. The default is the current directory.

  • options Modify the methods used for searching in several ways.

  • criteria Specify which files are chosen, and which are ignored. All files found are chosen by default.

  • action What to do with the files that are chosen. GNU find has a default action of -print, but standard Unix find has no default action, and will abort and complain unless an action is explicitly provided.

The Starting Point

The starting-point parameter has two effects on find's actions. The most obvious is that it specifies in which directory (or directories; there can be more than one starting point) to start looking for files. The other effect is on how the chosen filenames are treated, as this example shows:

$ cd /usr/X11/man
$ find man5 -print
man5
man5/XF86Config.5x
man5/pbm.5
man5/pgm.5
man5/pnm.5
man5/ppm.5
$ find /usr/X11/man/man5 -print
/usr/X11/man/man5
/usr/X11/man/man5/XF86Config.5x
/usr/X11/man/man5/pbm.5
/usr/X11/man/man5/pgm.5
/usr/X11/man/man5/pnm.5
/usr/X11/man/man5/ppm.5

When a user is simply looking for a file, this difference in behavior does not matter very much. But when you want to use the output from find to drive another program, it can be very important, depending on the program being driven.

In addition to the starting point, we have control over some other aspects of find's behavior, such as how it should handle soft links, how to evaluate file timestamps and how deep to follow directory structures. These are controlled by options.

The -follow option tells find to follow soft (or symbolic) links to the actual file. A soft link is a file that “points” to another file. To demonstrate this option, create (as a normal user, not as root) a soft link with ln in your home directory that points to file that belongs to root.

$ cd
$ ln -s /vmlinuz ./kernel

Now use ls to produce a long listing for the file.

$ ls -l kernel
lrwxrwxrwx ... kernel -> /vmlinuz

The first column of the mode, l, tells us it is a soft link. We also are told what file the link “points” to.

Now let's demonstrate the effect of find's -follow option by searching through the directory for files belonging to root, using it. (uid 0 is root; we'll cover the -uid option in more detail later.)

$ find . -uid 0 -print
nothing is printed
$ find . -follow -uid 0 -print
./kernel

You created the link to the kernel, so you own the link, called ./kernel. But the file /vmlinuz is owned by root.

The -daystart option modifies the behavior of find when it comes to evaluating time. When -daystart is specified, find will measure days from the beginning of the day instead of from 24 hours ago. (We will cover the parameters related to time later.)

Frequently a user will need to find a file that he or she knows is somewhere on local hard disk, and not on a mounted cdrom or network volume. An easy way to keep find from straying to these other disks is with the -xdev option.

$ find / -name document -print

will cause find to search for the file “document” in every directory under /, which can be very slow with a CDROM or network filesystem mounted.

$ find / -xdev -name document -print

will instead cause find to limit its search to the device that / is mounted on. (An alias for -xdev is -mount) Of course, if you have more than one local filesystem, you will need to execute a different search for it. Perhaps

$ find / /usr -xdev -name document -print

if you have two partitions, one for / and one for /usr. Alternately, you can say

$ find / -fstype ext2 -name document -print

if all your local partitions are ext2 filesystems.

Another way to save time on searches is to use the options related to directory depth.

$ find /usr -maxdepth 4 -name document -print

will limit find's search for document to directories four level deep or less “under” /usr.

Another option related to directory depth is -depth, which causes the directories to be selected before any files in them. We'll see later why this is useful.

The -noleaf option is used for searching filesystems that aren't Unix-like. Table 1 tells for which filesystems specifying -noleaf may speed up your search.

We already had an example of finding a file by name. Other mechanisms for matching filenames are -path, which matches by directory name, -iname, which is similar to -name but case insensitive, and -ipath, which is also case insensitive.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState