Make Peace with pax

pax is one of the lesser known utilities in a typical Linux installation. That's too bad, because pax has a very good feature set, and its command-line options are easy to understand and remember. pax is an archiver, like tar(1), but it's also a better version of cp(1) in some ways, not least because you can use pax with SSH to copy sets of files over a network. Once you learn pax, you may wonder how you lived without it all these years.

pax has four modes: list, read, write and copy. Reading and writing are controlled by the -r and -w options, repectively. In combination, -rw, pax acts a little bit like cp -R. If neither is used, pax lists the contents of the archive, which may be a file, device or a pipe.

By default, pax operates as a filter: it reads from standard input and writes to standard output, a feature that turns out to be very useful. But usually these days, the target is an archive file, the familiar tarball. Let's start by creating one:


$ cd /tmp
$ mkdir paxample
$ touch paxample/foo
$ pax -wf paxample.tar paxample

The -w option means write—that is, create an archive. The -f option provides the name of a file to which to write the archive. If desired, pax can gzip or bzip the file at the same time:


$ pax -wzf paxample.tar.gz paxample

Like most tar implementations, pax, by default, uses the Posix ustar file format. Because pax was born of a desire to unify archive file formats, many other formats also are supported, but in practice, they're seldom used. Likely as not, any .tar.gz file you download from the Internet actually will be a ustar archive:


$ pax -wzf paxample.tar.gz paxample
$ file paxample.tar*
paxample.tar:    POSIX tar archive
paxample.tar.gz: gzip compressed data

The first thing you nearly always want to know about any archive is what's in it. Listing the contents is the default action in the absence of either a -r or -w option:


$ pax -f paxample.tar
paxample
paxample/foo

Note that the archive retains the directory name you specified on the command line. That comes into play later when you read it.

To read an archive, use -r:


$ mkdir t
$ cd t
$ pax -rf ../paxample.tar

What did that do? Let's look at the source and target directories:


$ cd /tmp
$ find paxample t # traverse both trees
paxample
paxample/foo
t
t/paxample
t/paxample/foo

When pax read the paxample.tar archive, it created files in the current directory, t. Because the archive included a directory name, paxample, that directory was re-created in the output.

Copying Sets of Files

To my mind, pax's -r and -w options make more sense than their -x and -c equivalents in tar—reason enough to switch. But, pax can do more than tar: it can copy files too:


$ rm -rf t
$ pax -rw paxample t
$ find t
t
t/paxample
t/paxample/foo

Unlike cp(1), pax is an archive utility. Its job isn't to make copies, but to archive files. When pax creates a file, it preserves the file's metadata from its input. The form of the input doesn't matter. In this case, the input isn't from an archive, it's the file itself:


$ ls -l paxample/foo t/paxample/foo
-rw-r--r--  1 jklowden  wheel  0 Sep 22 15:45 paxample/foo
-rw-r--r--  1 jklowden  wheel  0 Sep 22 15:45 t/paxample/foo

Yes—two identical files with two identical timestamps. The permission bits and ownership can be controlled too, if desired. Take that, cp(1)!

Perhaps you don't want to re-create the directory, or perhaps you want to change it in some way. One option is not to mention the input directory on the command line, but instead provide filenames:


$ rm -rf t/paxample/
$ (cd paxample/ && pax -rw * ../t/)
$ find t
t
t/foo

That's usually easiest. But if you need something more sophisticated, the -s option rewrites the path—actually, any part of the filename—using a regular expression:


$ rm -rf t/*
$ pax -rw -s ':paxample:my/new/path:g' paxample/ t
$ find t
t
t/my
t/my/new
t/my/new/path
t/my/new/path/foo

The -s option is handy, for instance, when unpacking a tarball that doesn't have version information in the directory name.

What Could Go Wrong?

If you give the wrong filename to write, you just get an archive by the wrong namemdash;no harm no foul. If you mistype an input archive filename though, you'll find yourself in 1985:


$ pax -rf paxample.whoopsie
pax: Failed open to read on paxample.whoopsie (No such file 
or directory)

ATTENTION! pax archive volume change required.
Ready for archive volume: 1
Input archive name or "." to quit pax.
Archive name >

This is an idea that outlived its usefulness before it was implemented. You could type in the filename here, again, without readline support or tab completion. Well, at least it says what to do:


Archive name > .
Quitting pax!

How exciting!

As mentioned previously, pax uses standard input and standard output by default. That is a feature, but the first time you forget to provide a filename, you may think pax is very, very slow:


$ pax  -r paxample.tar

Oops! No -f. Also no message and no prompt. pax is ignoring the archive filename argument and reading standard input, which in this case, is the keyboard. You could type ^D, for end-of-file, but that forms invalid input to pax. Better to send up a smoke signal:


^C
pax: Signal caught, cleaning up.

It's even worse the first time you accidentally write to standard output while it's connected to your terminal. You heard it here first: don't do that.

Putting Standard Input to Work

Standard input and standard output do have their uses, and here pax really comes into its own. For one thing, you can verify the effect of the -s option without creating an archive or the files:


$ pax -w -s ':paxample:my/new/path:g' paxample/ | pax
my/new/path
my/new/path/foo

Absent the -f option, pax -w writes to standard output. So rewrite the pathname with -s, and pipe the output to pax again, this time using its list mode, with neither the -r nor -w option. By default, pax reads from standard input and, in list mode, prints the filenames on the terminal.

That can save a lot of time, not to mention a mess on the disk, when there are thousands of files.

Suppose you want to copy the paxample directory to another machine. One approach would be to create a tarball, copy to the target, log in to the target and unpack the tarball:


$ pax -wf paxample.tar paxample
$ scp paxample.tar oak:/tmp/
paxample.tar             100%   10KB  10.0KB/s   00:00
$ ssh oak
oak[~]$ cd /tmp
oak[tmp]$ pax -rf paxample.tar
oak[tmp]$ ls paxample/
foo

But there's a much easier way. Invoke pax on both machines, and connect the output of one to the input of the other:


$ pax -w paxample | ssh oak 'cd /tmp/ && pax -r && find paxample'
paxample
paxample/foo

pax -w writes to standard output. ssh reads standard input and attaches it to whatever utility is invoked, which of course in this case is pax again. pax -r reads from standard input and creates the files from that archive.

pax is one of the lesser known utilities in a typical Linux installation. But it's both simple and versatile, well worth the time it takes to learn—recommended.

Load Disqus comments