cpio
Create mode creates archive files. (This is also referred to as “copy-out” mode.) cpio accepts a list of file names, just as it does in pass-through mode. But instead of creating duplicate files in another area, it creates an archive and sends it to standard output.
Since it is sent to standard output, the archive can be redirected to any device or file such as a tape, diskette, or standard file.
$ find -depth /export/home \
| cpio --create > /dev/fd0
This creates an archive of the /export/home directory tree on the floppy drive at /dev/fd0. Of course, the /export/home area probably won't fit on one floppy, but cpio prompts for another device or file name when each floppy is filled, so it can be replaced, and the user can type the device name again. (Note that find's -depth switch is still recommended to prevent possible problems when the archive is extracted.)
When it comes to creating archives, cpio has many options. One of the most important is the format of the archive.
bin(default) the binary format encodes files in a non-portable method. Therefore, it is not suited for exchanging files between Linux on a PC and Linux on other architectures such as Alpha or Power PC.
odcold (POSIX.1) portable format. This is portable across platforms, but is not suited for file systems with more than 65536 inodes, which means most of today's larger hard disks.
newcnew portable format. This is portable across platforms, and has no inherent limit on number of inodes.
crcnew portable format, with a checksum added.
tarcompatible with tar, but only supports file names up to 100 characters.
ustarnew tar format. Supports up to 255 character file names.
hpbinnon-portable format used by HP/UX.
hpodc“portable” format used by HP/UX. Stores device files differently.
The archive format is specified with the --format switch.
Out of all the formats, the crc format is probably the best, since it is portable and has an extra degree of error checking via the checksum.
A better method for creating an archive would be:
$ find /export/home -depth | cpio --create \ --message="Insert next disk and type /dev/fd0 " \ --format=crc > /dev/fd0
This uses the crc format for the archive and prompts the user with Insert next disk and type /dev/fd0 as each floppy is filled. The --message option, which works in both create and extract mode, replaces the default message.
There are many other options available for the creation of archives, which I will cover later.
Even though GNU tar does have many of the advantages of cpio, the ability to use find to specify the files to be backed up provides much more flexibility than shell wildcards. [You can do this with tar, too, but you have to send the output of find into a file and use that file as an “include file” for tar—ED]
Extract mode (also referred to as “copy-in” mode) extracts files from archives. This mode is inconsistent with the other two, since file names are specified on the command line, instead of via a list on standard input.
$ cpio --extract < /dev/fd0
This command restores all of the files from the archive in /dev/fd0, since no file names were specified. If the archive spans more than one volume, cpio will prompt for each volume the same way it does when archives are created. The --message option can be used to override the default message, as in create mode.
cpio automatically recognizes archive formats during extraction, so it is not necessary to specify them on the command line.
The path passed to cpio by find is stored in the archive. Therefore it is important to pay attention to how find is used.
$ find . -depth | cpio --create > /tmp/archive
This creates an archive that extracts into the present working directory.
$ find /export/home -depth | cpio --create \
> /tmp/archive
This creates an archive that will try to extract to /export/home, regardless of the circumstances. If the -d option is specified the directory is created if it does not already exist. (If /export/home does not exist and -d is omitted, the extraction will fail.)
Anything specified on the command line that is not an option is treated as a filename pattern.
$ cpio --extract "back" < /dev/fd0
This will extract files in the archive that have back in their name. No other files will be restored. Multiple patterns can also be specified.
$ cpio --extract "back" "save" < /dev/fd0
This will extract files with “back” or “save” in their names.
In addition to providing patterns on the command line, they can be provided as lines in a file. The file is specified with the --pattern-file=filename option. This provides a lot of flexibility in restoring files, since the actual path does not have to be known and wildcards are not needed. Frequently restored patterns can be stored in a file.
The --nonmatching option is used to specify files not to extract.
It may help to see the contents of the archive before extracting anything from it.
$ cpio --list < /dev/fd0
The --list option lists the contents of the archive. The option --numeric-uid-gid forces the list to show user and group IDs numerically, instead of trying to resolve the names with the passwd and group files.
Instead of standard input and output the archive can be sent to (or extracted from) a file.
$ find /export/home -depth | cpio --create \ --file=/vol/archive
This option works either for creating or extracting archives. To use a remote tape drive specify the hostname and user name before the filename. (The user must have access to the remote host without a password. This can be done by using the file .rhosts)
$ find /export/home -depth | cpio --create \ --file=eric@bajor:/dev/rmt0
One of the key advantages of creating archives with this option is that disk files (archives not on tape or floppy) created with this option can be appended to with the --append option.
This command will work if eric has no password, (not recommended) or if the host that the command is run on is listed in the .rhosts file in eric's home directory.
When restoring an archive it is sometimes desirable to not alter the file modification times:
$ find /export/home -depth | cpio --extract \ --preserve-modification-times --file /vol/archive
The --preserve-modification-times option works in extract mode in addition to pass through mode.
In addition to preserving modification times, the access times for archived or copy files can be preserved so that the cpio operation does not affect the original files:
$ find . -depth | cpio --pass-through \ --make-directories --preserve-modification-times \ --reset-access-time /vol/copy
This will copy the current directory to /vol/copy while copying the modification times on the old files to the new and also leaving the access times on the original files untouched.
The default action for cpio, when operating in copy-in (extract) or pass-through mode, is to prompt a user for confirmation before writing over existing files, if the existing file is newer. By default, cpio will not replace the existing files. The --unconditional option overrides that behavior:
$ cpio --extract --unconditional "back" "save" \ < /dev/fd0
The --dereference option copies the file pointed to by a symbolic link, instead of the link itself, in archive creation and pass-through mode.
The --rename command will prompt the user to interactively rename each file. This only works in extract mode.
When acting as a system administrator, it is sometimes useful to restore an archive or duplicate a directory and change the user or group id of the target in the process.
$ cpio --extract --owner=eric.staff < /dev/fd0
This will restore the archive on /dev/fd0 and set the owner of all the extracted files to eric and the group to staff. Only root may use this option. If the group is left out, it will not be changed unless the . is included, in which case the group will be set to the user's login group.
Another option related to file ownership is --no-preserve-owner. This is the default behavior for non-root users. Files will belong to the user copying or extracting them, instead of the original user. For root the default is to preserve ownership.
There are also advanced options related to transferring data between big-endian and little-endian architectures and for controlling I/O buffer sizes to optimize performance.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Linux Systems Administrator
- Dynamic DNS—an Object Lesson in Problem Solving
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
5 hours 26 min ago - Dynamic DNS
6 hours 24 sec ago - Reply to comment | Linux Journal
6 hours 58 min ago - Reply to comment | Linux Journal
7 hours 49 min ago - Not free anymore
11 hours 51 min ago - Great
15 hours 38 min ago - Reply to comment | Linux Journal
15 hours 46 min ago - Understanding the Linux Kernel
18 hours 56 sec ago - General
20 hours 30 min ago - Kernel Problem
1 day 6 hours ago




Comments
Useful article
Just discovered this article while trying to diagnose some cpio problems... no surprises, it turned out to be the non-writable directories issue discussed...
Meaning of -depth option backwards, use -print0 and -0 options!
The -depth option to find ensures that directory names are output after the names of the files in them, not before. In combination with the --make-directories (or just -d) and --preserve-modification-times (or just -p) options to cpio, this results in cpio preserving the original modification time of both files and directories.
This works because cpio will create a directory automatically while writing the files inside it; only after it is done writing all the directory contents does it visit the directory itself to set its attributes, which includes resetting the modification time.
You are missing a couple other important options, though: the -print0 option to find and the --null (or just -0) option to cpio cause find and cpio to write and read the list of filenames terminated by a null character instead of a newline. Since most Linux filesystems allow names to contain nulls, this is important to properly archive such files and avoids doing something very bad with a file named like:
byebye<newline>/etc/passwdSo the full recommended command is:
$ find . -depth -print0 | cpio -pdm0v dest_dir