Make Peace with pax
pax is one of the lesser known utilities in a typical Linux installation. That's too bad, because pax has a very good feature set, and its command-line options are easy to understand and remember. pax is an archiver, like tar(1), but it's also a better version of cp(1) in some ways, not least because you can use pax with SSH to copy sets of files over a network. Once you learn pax, you may wonder how you lived without it all these years.
pax has four modes: list, read, write and copy. Reading and writing
are controlled by the
-w options, repectively. In combination,
-rw, pax acts a little bit like
-R. If neither is used, pax lists
the contents of the archive, which may be a file, device or a pipe.
By default, pax operates as a filter: it reads from standard input and writes to standard output, a feature that turns out to be very useful. But usually these days, the target is an archive file, the familiar tarball. Let's start by creating one:
$ cd /tmp $ mkdir paxample $ touch paxample/foo $ pax -wf paxample.tar paxample
-w option means write—that
is, create an archive. The
provides the name of a file to which to write the archive. If desired, pax
can gzip or bzip the file at the same time:
$ pax -wzf paxample.tar.gz paxample
Like most tar implementations, pax, by default, uses the Posix ustar file format. Because pax was born of a desire to unify archive file formats, many other formats also are supported, but in practice, they're seldom used. Likely as not, any .tar.gz file you download from the Internet actually will be a ustar archive:
$ pax -wzf paxample.tar.gz paxample $ file paxample.tar* paxample.tar: POSIX tar archive paxample.tar.gz: gzip compressed data
The first thing you nearly always want to know about any archive is
what's in it. Listing the contents is the default action in the
absence of either a
$ pax -f paxample.tar paxample paxample/foo
Note that the archive retains the directory name you specified on the command line. That comes into play later when you read it.
To read an archive, use
$ mkdir t $ cd t $ pax -rf ../paxample.tar
What did that do? Let's look at the source and target directories:
$ cd /tmp $ find paxample t # traverse both trees paxample paxample/foo t t/paxample t/paxample/foo
When pax read the paxample.tar archive, it created files in the current directory, t. Because the archive included a directory name, paxample, that directory was re-created in the output.
Copying Sets of Files
To my mind, pax's
options make more sense than their
-c equivalents in tar—reason enough to switch.
But, pax can do more
than tar: it can copy files too:
$ rm -rf t $ pax -rw paxample t $ find t t t/paxample t/paxample/foo
Unlike cp(1), pax is an archive utility. Its job isn't to make copies, but to archive files. When pax creates a file, it preserves the file's metadata from its input. The form of the input doesn't matter. In this case, the input isn't from an archive, it's the file itself:
$ ls -l paxample/foo t/paxample/foo -rw-r--r-- 1 jklowden wheel 0 Sep 22 15:45 paxample/foo -rw-r--r-- 1 jklowden wheel 0 Sep 22 15:45 t/paxample/foo
Yes—two identical files with two identical timestamps. The permission bits and ownership can be controlled too, if desired. Take that, cp(1)!
Perhaps you don't want to re-create the directory, or perhaps you want to change it in some way. One option is not to mention the input directory on the command line, but instead provide filenames:
$ rm -rf t/paxample/ $ (cd paxample/ && pax -rw * ../t/) $ find t t t/foo
That's usually easiest. But if you need something more sophisticated,
-s option rewrites the path—actually, any
part of the filename—using a regular expression:
$ rm -rf t/* $ pax -rw -s ':paxample:my/new/path:g' paxample/ t $ find t t t/my t/my/new t/my/new/path t/my/new/path/foo
-s option is handy, for instance, when unpacking a tarball
that doesn't have version information in the directory name.
What Could Go Wrong?
If you give the wrong filename to write, you just get an archive by the wrong namemdash;no harm no foul. If you mistype an input archive filename though, you'll find yourself in 1985:
$ pax -rf paxample.whoopsie pax: Failed open to read on paxample.whoopsie (No such file or directory) ATTENTION! pax archive volume change required. Ready for archive volume: 1 Input archive name or "." to quit pax. Archive name >
This is an idea that outlived its usefulness before it was implemented. You could type in the filename here, again, without readline support or tab completion. Well, at least it says what to do:
Archive name > . Quitting pax!
As mentioned previously, pax uses standard input and standard output by default. That is a feature, but the first time you forget to provide a filename, you may think pax is very, very slow:
$ pax -r paxample.tar
-f. Also no message and no prompt. pax is ignoring
the archive filename argument and reading standard input, which in this
case, is the keyboard. You could type ^D, for end-of-file, but that forms
invalid input to pax. Better to send up a smoke signal:
^C pax: Signal caught, cleaning up.
It's even worse the first time you accidentally write to standard output while it's connected to your terminal. You heard it here first: don't do that.
Putting Standard Input to Work
Standard input and standard output do have their uses, and here
pax really comes into its own. For one thing, you can verify the effect
-s option without creating an archive or the
$ pax -w -s ':paxample:my/new/path:g' paxample/ | pax my/new/path my/new/path/foo
writes to standard output. So rewrite the
-s, and pipe the output to pax again, this time using its
list mode, with neither the
-w option. By default, pax reads from
standard input and, in list mode, prints the filenames on the
That can save a lot of time, not to mention a mess on the disk, when there are thousands of files.
Suppose you want to copy the paxample directory to another machine. One approach would be to create a tarball, copy to the target, log in to the target and unpack the tarball:
$ pax -wf paxample.tar paxample $ scp paxample.tar oak:/tmp/ paxample.tar 100% 10KB 10.0KB/s 00:00 $ ssh oak oak[~]$ cd /tmp oak[tmp]$ pax -rf paxample.tar oak[tmp]$ ls paxample/ foo
But there's a much easier way. Invoke pax on both machines, and connect the output of one to the input of the other:
$ pax -w paxample | ssh oak 'cd /tmp/ && pax -r && find paxample' paxample paxample/foo
pax -w writes to standard output.
ssh reads standard input and
attaches it to whatever utility is invoked, which of course in this
case is pax again.
pax -r reads from standard input and creates the
files from that archive.
pax is one of the lesser known utilities in a typical Linux installation. But it's both simple and versatile, well worth the time it takes to learn—recommended.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SUSE LLC's SUSE Manager
- SourceClear Open
- Managing Linux Using Puppet
- My +1 Sword of Productivity
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- Non-Linux FOSS: Caffeine!
- SuperTuxKart 0.9.2 Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide