Make Peace with pax
pax is one of the lesser known utilities in a typical Linux installation. That's too bad, because pax has a very good feature set, and its command-line options are easy to understand and remember. pax is an archiver, like tar(1), but it's also a better version of cp(1) in some ways, not least because you can use pax with SSH to copy sets of files over a network. Once you learn pax, you may wonder how you lived without it all these years.
pax has four modes: list, read, write and copy. Reading and writing
are controlled by the
-w options, repectively. In combination,
-rw, pax acts a little bit like
-R. If neither is used, pax lists
the contents of the archive, which may be a file, device or a pipe.
By default, pax operates as a filter: it reads from standard input and writes to standard output, a feature that turns out to be very useful. But usually these days, the target is an archive file, the familiar tarball. Let's start by creating one:
$ cd /tmp $ mkdir paxample $ touch paxample/foo $ pax -wf paxample.tar paxample
-w option means write—that
is, create an archive. The
provides the name of a file to which to write the archive. If desired, pax
can gzip or bzip the file at the same time:
$ pax -wzf paxample.tar.gz paxample
Like most tar implementations, pax, by default, uses the Posix ustar file format. Because pax was born of a desire to unify archive file formats, many other formats also are supported, but in practice, they're seldom used. Likely as not, any .tar.gz file you download from the Internet actually will be a ustar archive:
$ pax -wzf paxample.tar.gz paxample $ file paxample.tar* paxample.tar: POSIX tar archive paxample.tar.gz: gzip compressed data
The first thing you nearly always want to know about any archive is
what's in it. Listing the contents is the default action in the
absence of either a
$ pax -f paxample.tar paxample paxample/foo
Note that the archive retains the directory name you specified on the command line. That comes into play later when you read it.
To read an archive, use
$ mkdir t $ cd t $ pax -rf ../paxample.tar
What did that do? Let's look at the source and target directories:
$ cd /tmp $ find paxample t # traverse both trees paxample paxample/foo t t/paxample t/paxample/foo
When pax read the paxample.tar archive, it created files in the current directory, t. Because the archive included a directory name, paxample, that directory was re-created in the output.
Copying Sets of Files
To my mind, pax's
options make more sense than their
-c equivalents in tar—reason enough to switch.
But, pax can do more
than tar: it can copy files too:
$ rm -rf t $ pax -rw paxample t $ find t t t/paxample t/paxample/foo
Unlike cp(1), pax is an archive utility. Its job isn't to make copies, but to archive files. When pax creates a file, it preserves the file's metadata from its input. The form of the input doesn't matter. In this case, the input isn't from an archive, it's the file itself:
$ ls -l paxample/foo t/paxample/foo -rw-r--r-- 1 jklowden wheel 0 Sep 22 15:45 paxample/foo -rw-r--r-- 1 jklowden wheel 0 Sep 22 15:45 t/paxample/foo
Yes—two identical files with two identical timestamps. The permission bits and ownership can be controlled too, if desired. Take that, cp(1)!
Perhaps you don't want to re-create the directory, or perhaps you want to change it in some way. One option is not to mention the input directory on the command line, but instead provide filenames:
$ rm -rf t/paxample/ $ (cd paxample/ && pax -rw * ../t/) $ find t t t/foo
That's usually easiest. But if you need something more sophisticated,
-s option rewrites the path—actually, any
part of the filename—using a regular expression:
$ rm -rf t/* $ pax -rw -s ':paxample:my/new/path:g' paxample/ t $ find t t t/my t/my/new t/my/new/path t/my/new/path/foo
-s option is handy, for instance, when unpacking a tarball
that doesn't have version information in the directory name.
What Could Go Wrong?
If you give the wrong filename to write, you just get an archive by the wrong namemdash;no harm no foul. If you mistype an input archive filename though, you'll find yourself in 1985:
$ pax -rf paxample.whoopsie pax: Failed open to read on paxample.whoopsie (No such file or directory) ATTENTION! pax archive volume change required. Ready for archive volume: 1 Input archive name or "." to quit pax. Archive name >
This is an idea that outlived its usefulness before it was implemented. You could type in the filename here, again, without readline support or tab completion. Well, at least it says what to do:
Archive name > . Quitting pax!
As mentioned previously, pax uses standard input and standard output by default. That is a feature, but the first time you forget to provide a filename, you may think pax is very, very slow:
$ pax -r paxample.tar
-f. Also no message and no prompt. pax is ignoring
the archive filename argument and reading standard input, which in this
case, is the keyboard. You could type ^D, for end-of-file, but that forms
invalid input to pax. Better to send up a smoke signal:
^C pax: Signal caught, cleaning up.
It's even worse the first time you accidentally write to standard output while it's connected to your terminal. You heard it here first: don't do that.
Putting Standard Input to Work
Standard input and standard output do have their uses, and here
pax really comes into its own. For one thing, you can verify the effect
-s option without creating an archive or the
$ pax -w -s ':paxample:my/new/path:g' paxample/ | pax my/new/path my/new/path/foo
writes to standard output. So rewrite the
-s, and pipe the output to pax again, this time using its
list mode, with neither the
-w option. By default, pax reads from
standard input and, in list mode, prints the filenames on the
That can save a lot of time, not to mention a mess on the disk, when there are thousands of files.
Suppose you want to copy the paxample directory to another machine. One approach would be to create a tarball, copy to the target, log in to the target and unpack the tarball:
$ pax -wf paxample.tar paxample $ scp paxample.tar oak:/tmp/ paxample.tar 100% 10KB 10.0KB/s 00:00 $ ssh oak oak[~]$ cd /tmp oak[tmp]$ pax -rf paxample.tar oak[tmp]$ ls paxample/ foo
But there's a much easier way. Invoke pax on both machines, and connect the output of one to the input of the other:
$ pax -w paxample | ssh oak 'cd /tmp/ && pax -r && find paxample' paxample paxample/foo
pax -w writes to standard output.
ssh reads standard input and
attaches it to whatever utility is invoked, which of course in this
case is pax again.
pax -r reads from standard input and creates the
files from that archive.
pax is one of the lesser known utilities in a typical Linux installation. But it's both simple and versatile, well worth the time it takes to learn—recommended.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.