Make Peace with pax

pax is one of the lesser known utilities in a typical Linux installation. That's too bad, because pax has a very good feature set, and its command-line options are easy to understand and remember. pax is an archiver, like tar(1), but it's also a better version of cp(1) in some ways, not least because you can use pax with SSH to copy sets of files over a network. Once you learn pax, you may wonder how you lived without it all these years.

pax has four modes: list, read, write and copy. Reading and writing are controlled by the -r and -w options, repectively. In combination, -rw, pax acts a little bit like cp -R. If neither is used, pax lists the contents of the archive, which may be a file, device or a pipe.

By default, pax operates as a filter: it reads from standard input and writes to standard output, a feature that turns out to be very useful. But usually these days, the target is an archive file, the familiar tarball. Let's start by creating one:


$ cd /tmp
$ mkdir paxample
$ touch paxample/foo
$ pax -wf paxample.tar paxample

The -w option means write—that is, create an archive. The -f option provides the name of a file to which to write the archive. If desired, pax can gzip or bzip the file at the same time:


$ pax -wzf paxample.tar.gz paxample

Like most tar implementations, pax, by default, uses the Posix ustar file format. Because pax was born of a desire to unify archive file formats, many other formats also are supported, but in practice, they're seldom used. Likely as not, any .tar.gz file you download from the Internet actually will be a ustar archive:


$ pax -wzf paxample.tar.gz paxample
$ file paxample.tar*
paxample.tar:    POSIX tar archive
paxample.tar.gz: gzip compressed data

The first thing you nearly always want to know about any archive is what's in it. Listing the contents is the default action in the absence of either a -r or -w option:


$ pax -f paxample.tar
paxample
paxample/foo

Note that the archive retains the directory name you specified on the command line. That comes into play later when you read it.

To read an archive, use -r:


$ mkdir t
$ cd t
$ pax -rf ../paxample.tar

What did that do? Let's look at the source and target directories:


$ cd /tmp
$ find paxample t # traverse both trees
paxample
paxample/foo
t
t/paxample
t/paxample/foo

When pax read the paxample.tar archive, it created files in the current directory, t. Because the archive included a directory name, paxample, that directory was re-created in the output.

Copying Sets of Files

To my mind, pax's -r and -w options make more sense than their -x and -c equivalents in tar—reason enough to switch. But, pax can do more than tar: it can copy files too:


$ rm -rf t
$ pax -rw paxample t
$ find t
t
t/paxample
t/paxample/foo

Unlike cp(1), pax is an archive utility. Its job isn't to make copies, but to archive files. When pax creates a file, it preserves the file's metadata from its input. The form of the input doesn't matter. In this case, the input isn't from an archive, it's the file itself:


$ ls -l paxample/foo t/paxample/foo
-rw-r--r--  1 jklowden  wheel  0 Sep 22 15:45 paxample/foo
-rw-r--r--  1 jklowden  wheel  0 Sep 22 15:45 t/paxample/foo

Yes—two identical files with two identical timestamps. The permission bits and ownership can be controlled too, if desired. Take that, cp(1)!

Perhaps you don't want to re-create the directory, or perhaps you want to change it in some way. One option is not to mention the input directory on the command line, but instead provide filenames:


$ rm -rf t/paxample/
$ (cd paxample/ && pax -rw * ../t/)
$ find t
t
t/foo

That's usually easiest. But if you need something more sophisticated, the -s option rewrites the path—actually, any part of the filename—using a regular expression:


$ rm -rf t/*
$ pax -rw -s ':paxample:my/new/path:g' paxample/ t
$ find t
t
t/my
t/my/new
t/my/new/path
t/my/new/path/foo

The -s option is handy, for instance, when unpacking a tarball that doesn't have version information in the directory name.

What Could Go Wrong?

If you give the wrong filename to write, you just get an archive by the wrong namemdash;no harm no foul. If you mistype an input archive filename though, you'll find yourself in 1985:


$ pax -rf paxample.whoopsie
pax: Failed open to read on paxample.whoopsie (No such file 
or directory)

ATTENTION! pax archive volume change required.
Ready for archive volume: 1
Input archive name or "." to quit pax.
Archive name >

This is an idea that outlived its usefulness before it was implemented. You could type in the filename here, again, without readline support or tab completion. Well, at least it says what to do:


Archive name > .
Quitting pax!

How exciting!

As mentioned previously, pax uses standard input and standard output by default. That is a feature, but the first time you forget to provide a filename, you may think pax is very, very slow:


$ pax  -r paxample.tar

Oops! No -f. Also no message and no prompt. pax is ignoring the archive filename argument and reading standard input, which in this case, is the keyboard. You could type ^D, for end-of-file, but that forms invalid input to pax. Better to send up a smoke signal:


^C
pax: Signal caught, cleaning up.

It's even worse the first time you accidentally write to standard output while it's connected to your terminal. You heard it here first: don't do that.

Putting Standard Input to Work

Standard input and standard output do have their uses, and here pax really comes into its own. For one thing, you can verify the effect of the -s option without creating an archive or the files:


$ pax -w -s ':paxample:my/new/path:g' paxample/ | pax
my/new/path
my/new/path/foo

Absent the -f option, pax -w writes to standard output. So rewrite the pathname with -s, and pipe the output to pax again, this time using its list mode, with neither the -r nor -w option. By default, pax reads from standard input and, in list mode, prints the filenames on the terminal.

That can save a lot of time, not to mention a mess on the disk, when there are thousands of files.

Suppose you want to copy the paxample directory to another machine. One approach would be to create a tarball, copy to the target, log in to the target and unpack the tarball:


$ pax -wf paxample.tar paxample
$ scp paxample.tar oak:/tmp/
paxample.tar             100%   10KB  10.0KB/s   00:00
$ ssh oak
oak[~]$ cd /tmp
oak[tmp]$ pax -rf paxample.tar
oak[tmp]$ ls paxample/
foo

But there's a much easier way. Invoke pax on both machines, and connect the output of one to the input of the other:


$ pax -w paxample | ssh oak 'cd /tmp/ && pax -r && find paxample'
paxample
paxample/foo

pax -w writes to standard output. ssh reads standard input and attaches it to whatever utility is invoked, which of course in this case is pax again. pax -r reads from standard input and creates the files from that archive.

pax is one of the lesser known utilities in a typical Linux installation. But it's both simple and versatile, well worth the time it takes to learn—recommended.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

forget about pax

ruario's picture

Everything listed in that article can be done with GNU tar, which is pre-installed on every major distro. GNU tar can read a file list via -T (use '-T- --no-recursion' to effectively read from find results in a pipe), you can copy using two instances in a pipe and --xform lets you alter the path on the fly.

Additionally, the pax version provided by most distros is a very poorly maintained fork of an old MirBSD pax. GNU tar and BSD tar (its most obvious rivals) are still well maintained and have a wealth of extra features. For example seamless support for modern compression methods such as XZ. BSD tar (and BSD cpio for that matter) also open a range of other formats given they are based on libarchive.

With regards to USTAR, actually the two most relevant implementations tar, GNU and BSD, write in GNUtar and PAX (POSIX.1-2001) format by default (not USTAR). This is done because neither format have major limitations. USTAR on the other hand is an older POSIX standard and has limitations that you might actually hit on a modern system, e.g. 8 GB limit on files within the archive. Given I have ISO images and media files in that size range lying on my disk, USTAR is not a format I can trust without the risk of data corruption.

Ironically, the MirBSD pax that most distros provide does not actually support the PAX format! Think about how stupid that is for a second. To me it clearly demonstrates how poorly it is maintained.

P.S. If you really do want a pax that actually handles the PAX file format, use Heirloom pax from The Heirloom Project.

tar can shoot a file across a network too

Peter Holt Hoffman's picture

I've not used pax since I was running Coherent so this was an interesting article.

By the way, if you use the dash as a file name, tar will read/write standard in/out. You can compress the stream on the fly too. As a result, it can be used the same way as pax in the example:

tar -czf - paxample | ssh oak 'cd /tmp/ && tar -xzf - && find paxample'

Pax is so misunderstood

Anonymous's picture

Pax has an undeserved reputation as being too complicated to use yet so very useful in converting between formats. YOur article should help.

How is it better than rsync -a?

Anonymous's picture

Seems like rsync can do everything listed, and it is more prominent anyways.

Interesting article. I'll try

Anonymous's picture

Interesting article. I'll try to start using it little by little.

On another note the comment section is overly complicated. Maybe you could write an article on using https://www.mozilla.org/en-US/persona/ and how it was incorporated into your comment sections?

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState