cpio

In this month's column, Eric moves beyond find to cover duplicating files and directory trees using the versatile cpio command. cpio uses space on tape more efficiently than tar and is an excellent alternative for creating archives on platforms that do not have the GNU utilities available. Read on for a thorough discussion of cpio and its three modes of operation: Pass-through, Create and Extract.
Create Mode

Create mode creates archive files. (This is also referred to as “copy-out” mode.) cpio accepts a list of file names, just as it does in pass-through mode. But instead of creating duplicate files in another area, it creates an archive and sends it to standard output.

Since it is sent to standard output, the archive can be redirected to any device or file such as a tape, diskette, or standard file.

$ find -depth /export/home \
    | cpio --create > /dev/fd0

This creates an archive of the /export/home directory tree on the floppy drive at /dev/fd0. Of course, the /export/home area probably won't fit on one floppy, but cpio prompts for another device or file name when each floppy is filled, so it can be replaced, and the user can type the device name again. (Note that find's -depth switch is still recommended to prevent possible problems when the archive is extracted.)

When it comes to creating archives, cpio has many options. One of the most important is the format of the archive.

bin(default) the binary format encodes files in a non-portable method. Therefore, it is not suited for exchanging files between Linux on a PC and Linux on other architectures such as Alpha or Power PC.

odcold (POSIX.1) portable format. This is portable across platforms, but is not suited for file systems with more than 65536 inodes, which means most of today's larger hard disks.

newcnew portable format. This is portable across platforms, and has no inherent limit on number of inodes.

crcnew portable format, with a checksum added.

tarcompatible with tar, but only supports file names up to 100 characters.

ustarnew tar format. Supports up to 255 character file names.

hpbinnon-portable format used by HP/UX.

hpodc“portable” format used by HP/UX. Stores device files differently.

The archive format is specified with the --format switch.

Out of all the formats, the crc format is probably the best, since it is portable and has an extra degree of error checking via the checksum.

A better method for creating an archive would be:

$ find /export/home -depth | cpio --create \
  --message="Insert next disk and type /dev/fd0 " \
  --format=crc > /dev/fd0

This uses the crc format for the archive and prompts the user with Insert next disk and type /dev/fd0 as each floppy is filled. The --message option, which works in both create and extract mode, replaces the default message.

There are many other options available for the creation of archives, which I will cover later.

Even though GNU tar does have many of the advantages of cpio, the ability to use find to specify the files to be backed up provides much more flexibility than shell wildcards. [You can do this with tar, too, but you have to send the output of find into a file and use that file as an “include file” for tar—ED]

Extract Mode

Extract mode (also referred to as “copy-in” mode) extracts files from archives. This mode is inconsistent with the other two, since file names are specified on the command line, instead of via a list on standard input.

$ cpio --extract < /dev/fd0

This command restores all of the files from the archive in /dev/fd0, since no file names were specified. If the archive spans more than one volume, cpio will prompt for each volume the same way it does when archives are created. The --message option can be used to override the default message, as in create mode.

cpio automatically recognizes archive formats during extraction, so it is not necessary to specify them on the command line.

The path passed to cpio by find is stored in the archive. Therefore it is important to pay attention to how find is used.

$ find . -depth | cpio --create > /tmp/archive

This creates an archive that extracts into the present working directory.

$ find /export/home -depth | cpio --create \
    > /tmp/archive

This creates an archive that will try to extract to /export/home, regardless of the circumstances. If the -d option is specified the directory is created if it does not already exist. (If /export/home does not exist and -d is omitted, the extraction will fail.)

Anything specified on the command line that is not an option is treated as a filename pattern.

$ cpio --extract "back" < /dev/fd0

This will extract files in the archive that have back in their name. No other files will be restored. Multiple patterns can also be specified.

$ cpio --extract "back" "save" < /dev/fd0

This will extract files with “back” or “save” in their names.

In addition to providing patterns on the command line, they can be provided as lines in a file. The file is specified with the --pattern-file=filename option. This provides a lot of flexibility in restoring files, since the actual path does not have to be known and wildcards are not needed. Frequently restored patterns can be stored in a file.

The --nonmatching option is used to specify files not to extract.

It may help to see the contents of the archive before extracting anything from it.

$ cpio --list < /dev/fd0

The --list option lists the contents of the archive. The option --numeric-uid-gid forces the list to show user and group IDs numerically, instead of trying to resolve the names with the passwd and group files.

Instead of standard input and output the archive can be sent to (or extracted from) a file.

$ find /export/home -depth | cpio --create \
  --file=/vol/archive

This option works either for creating or extracting archives. To use a remote tape drive specify the hostname and user name before the filename. (The user must have access to the remote host without a password. This can be done by using the file .rhosts)

$ find /export/home -depth | cpio --create \
  --file=eric@bajor:/dev/rmt0

One of the key advantages of creating archives with this option is that disk files (archives not on tape or floppy) created with this option can be appended to with the --append option.

This command will work if eric has no password, (not recommended) or if the host that the command is run on is listed in the .rhosts file in eric's home directory.

When restoring an archive it is sometimes desirable to not alter the file modification times:

$ find /export/home -depth | cpio --extract \
  --preserve-modification-times --file
/vol/archive

The --preserve-modification-times option works in extract mode in addition to pass through mode.

In addition to preserving modification times, the access times for archived or copy files can be preserved so that the cpio operation does not affect the original files:

$ find . -depth | cpio --pass-through \
 --make-directories --preserve-modification-times \
 --reset-access-time /vol/copy

This will copy the current directory to /vol/copy while copying the modification times on the old files to the new and also leaving the access times on the original files untouched.

The default action for cpio, when operating in copy-in (extract) or pass-through mode, is to prompt a user for confirmation before writing over existing files, if the existing file is newer. By default, cpio will not replace the existing files. The --unconditional option overrides that behavior:

$ cpio --extract --unconditional "back" "save" \
  < /dev/fd0

The --dereference option copies the file pointed to by a symbolic link, instead of the link itself, in archive creation and pass-through mode.

The --rename command will prompt the user to interactively rename each file. This only works in extract mode.

When acting as a system administrator, it is sometimes useful to restore an archive or duplicate a directory and change the user or group id of the target in the process.

$ cpio --extract --owner=eric.staff < /dev/fd0

This will restore the archive on /dev/fd0 and set the owner of all the extracted files to eric and the group to staff. Only root may use this option. If the group is left out, it will not be changed unless the . is included, in which case the group will be set to the user's login group.

Another option related to file ownership is --no-preserve-owner. This is the default behavior for non-root users. Files will belong to the user copying or extracting them, instead of the original user. For root the default is to preserve ownership.

There are also advanced options related to transferring data between big-endian and little-endian architectures and for controlling I/O buffer sizes to optimize performance.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Useful article

Anonymous's picture

Just discovered this article while trying to diagnose some cpio problems... no surprises, it turned out to be the non-writable directories issue discussed...

Meaning of -depth option backwards, use -print0 and -0 options!

John Keith Hohm's picture

The -depth option to find ensures that directory names are output after the names of the files in them, not before. In combination with the --make-directories (or just -d) and --preserve-modification-times (or just -p) options to cpio, this results in cpio preserving the original modification time of both files and directories.

This works because cpio will create a directory automatically while writing the files inside it; only after it is done writing all the directory contents does it visit the directory itself to set its attributes, which includes resetting the modification time.

You are missing a couple other important options, though: the -print0 option to find and the --null (or just -0) option to cpio cause find and cpio to write and read the list of filenames terminated by a null character instead of a newline. Since most Linux filesystems allow names to contain nulls, this is important to properly archive such files and avoids doing something very bad with a file named like:

byebye<newline>/etc/passwd

So the full recommended command is:

$ find . -depth -print0 | cpio -pdm0v dest_dir

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState