Stupid tar Tricks

One of the most common programs on Linux systems for packaging files is the venerable tar. tar is short for tape archive, and originally, it would archive your files to a tape device. Now, you're more likely to use a file to make your archive. To use a tarfile, use the command-line option -f . To create a new tarfile, use the command-line option -c. To extract files from a tarfile, use the option -x. You also can compress the resulting tarfile via two methods. To use bzip2, use the -j option, or for gzip, use the -z option.

Instead of using a tarfile, you can output your tarfile to stdout or input your tarfile from stdin by using a hyphen (-). With these options, you can tar up a directory and all of its subdirectories by using:

tar cf archive.tar dir

Then, extract it in another directory with:

tar xf archive.tar

When creating a tarfile, you can assign a volume name with the option -V . You can move an entire directory structure with tar by executing:

tar cf - dir1 | (cd dir2; tar xf -)

You can go even farther and move an entire directory structure over the network by executing:

tar cf - dir1 | ssh remote_host "( cd /path/to/dir2; tar xf - )"

GNU tar includes an option that lets you skip the cd part, -C /path/to/dest. You also can interact with tarfiles over the network by including a host part to the tarfile name. For example:

tar cvf username@remotehost:/path/to/dest/archive.tar dir1

This is done by using rsh as the communication mechanism. If you want to use something else, like ssh, use the command-line option --rsh-command CMD. Sometimes, you also may need to give the path to the rmt executable on the remote host. On some hosts, it won't be in the default location /usr/sbin/rmt. So, all together, this would look like:

tar -c -v --rsh-command ssh --rmt-command /sbin/rmt 
 ↪-f username@host:/path/to/dest/archive.tar dir1

Although tar originally used to write its archive to a tape drive, it can be used to write to any device. For example, if you want to get a dump of your current filesystem to a secondary hard drive, use:

tar -cvzf /dev/hdd /

Of course, you need to run the above command as root. If you are writing your tarfile to a device that is too small, you can tell tar to do a multivolume archive with the -M option. For those of you who are old enough to remember floppy disks, you can back up your home directory to a series of floppy disks by executing:

tar -cvMf /dev/fd0 $HOME

If you are doing backups, you may want to preserve the file permissions. You can do this with the -p option. If you have symlinked files on your filesystem, you can dereference the symlinks with the -h option. This tells tar actually to dump the file that the symlink points to, not just the symlink.

Along the same lines, if you have several filesystems mounted, you can tell tar to stick to only one filesystem with the option -l. Hopefully, this gives you lots of ideas for ways to archive your files.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

- unix magic -

voku's picture

"magic tar" -> ;-) -> http://bit.ly/aR7QC8

The new season again come, no

lincc272's picture

The new season again come, no matter you are tide male, or beauties,
or ordinary people. Choose came in! Here are not only the fashionable
and popular men's clothing and women's dress.And with beautiful shoe
tide packets can photograph collocation. Make a new you begin from here!
Has a good news to tell everybody: Recently,every bought full 200 US dollars
in this company, then has the present to see off,Vietnam which buys delivers
are more, please do not miss this good opportunity!!!
welcome to ::[ w w w .b i z b o y s e l l .c o m ]
(b..r..a..n..d.)s.h.o.e.s.(34u.s.d),,

==J. a .m .e )) shoes

< j o r d a n> (1-24) shoes

< j o r d a n> 2010 shoes

c.l.o.t.h.i.n.g,,j.e.a.n,,h.a.n.d.b.a.g(35u.s.d),,

==c .o. a .c .h )) handbag

(f.r.e.e)s.h.i.p.p.i.n.g

[ w w w .b i z b o y s e l l . c o m ]

tar + ftp (corr)

pomkwi's picture

Message resent with {...} instead of angle brackets, badly treated because of html interaction!

You may combine tar and ftp to very simply save all a hierarchy at once, without any intermediate file. Very powerful.

Within ftp emit such a directive (names within curly brackets are only examples here) :

put "|tar cf - {.}" {dir_contents.tar}

there is no limit as for the length of the generated file, but restrict it to 2 GB on 32-bits remote machines.

You may get back the {dir_contents.tar} as a normal file but you may also have only a selection of named files back with the complementary directive (but be aware that complete file transfer will take place) :

get {dir_contents.tar} "|tar xvf - {./name1} [{./name2} ...]"

I use this as a convenient way to backup files (cron during the night). I also use --newer {date} to set up an incremental backup.

Just a reminder on ftp. You can replace any local file in put/get directives by some piped command. It's standard ftp ! You may use

put "| tar cf - {.} | gzip -c" {dir_contents.tgz}

and thus use a double pipe if you wish so.

Hope this will help somebody.

Cd with ; vs Cd with &

Drew Sullivan's picture

The problem when running:

tar cf - dir1 | (cd dir2; tar xf -)

is that if the cd fails to go to dir2 then the tar extract continues in the same directory happily overwriting the files with the same contents. Sometimes this will leave files truncated if the tar create hasn't finished with the file.

ALWAYS follow the cd with two ampersands ('&&') rather than a semi-colon (';') so that if the cd fails the tar extrace doesn't execute.

tar cf - dir1 | (cd dir2 && tar xf -)

Or use the cp command with -a

Anonymous's picture

Or use the cp command with -a option (ex: cp -a dir1 dir2 ). Much simpler, although less impressive, if you are typing this in front on a less knowledgeable colleague.

tar + ftp

pomkwi's picture

You may combine tar and ftp to very simply save all a hierarchy at once, without any intermediate file. Very powerful.

Within ftp emit such a directive (names within brackets are only examples here) :

put "|tar cf - <.>"

there is no limit as for the length of the generated file, but restrict it to 2 GB on 32-bits remote machines.

You may get back the as a normal file but you may also have only a selection of named files back with the complementary directive (but be aware that complete file transfer will take place) :

get "|tar xvf - <./name1> [<./name2> ...]"

I use this as a convenient way to backup files (cron during the night). I also use --newer to set up an incremental backup.

Just a reminder on ftp. You can replace any local file in put/get directives by some piped command. It's standard ftp ! You may use

put "| tar cf - <.> | gzip -c"

and thus use a double pipe if you wish so.

Hope this will help somebody.

tar + ftp

pomkwi's picture

You may combine tar and ftp to very simply save all a hierarchy at once, without any intermediate file. Very powerful.

Within ftp emit such a directive (names within brackets are only examples here) :

put "|tar cf - <.>"

there is no limit as for the length of the generated file, but restrict it to 2 GB on 32-bits remote machines.

You may get back the as a normal file but you may also have only a selection of named files back with the complementary directive (but be aware that complete file transfer will take place) :

get "|tar xvf - <./name1> [<./name2> ...]"

I use this as a convenient way to backup files (cron during the night). I also use --newer to set up an incremental backup.

Just a reminder on ftp. You can replace any local file in put/get directives by some piped command. It's standard ftp ! You may use

put "| tar cf - <.> | gzip -c"

and thus use a double pipe if you wish so.

Hope this will help somebody.

tar + ftp

pomkwi's picture

You may combine tar and ftp to very simply save all a hierarchy at once, without any intermediate file. Very powerful.

Within ftp emit such a directive (names within brackets are only examples here) :

put "|tar cf - <.>"

there is no limit as for the length of the generated file, but restrict it to 2 GB on 32-bits remote machines.

You may get back the as a normal file but you may also have only a selection of named files back with the complementary directive (but be aware that complete file transfer will take place) :

get "|tar xvf - <./name1> [<./name2> ...]"

I use this as a convenient way to backup files (cron during the night). I also use --newer to set up an incremental backup.

Just a reminder on ftp. You can replace any local file in put/get directives by some piped command. It's standard ftp ! You may use

put "| tar cf - <.> | gzip -c"

and thus use a double pipe if you wish so.

Hope this will help somebody.

Translate to portuguese and re-publish

emerson.morgado's picture

Hi LJ staff, I would like to translate it to my native language and republish in my blog, can I?

Translation Request

Rebecca Cassity's picture

Hi Emerson,

Thanks for your request, we do occasionally license content for translation. Please send me an e-mail at rebecca@linuxjournal.com for details.

Thanks,
Rebecca

Rebecca Cassity is the Director of Sales for Linux Journal

Netcat

Anonymous's picture

Netcat (nc) can be used when you archive (or compress) files and there is no place on the source computer. Using netcat you don't need ssh installed on the target computer.

Moving large volumes of data

matt's picture

Moving large volumes of data it may be better to use rsync than tar. On Windows then you can use rsync or Grsync.

Changing the target directory from an argument

Anonymous's picture

You may substitute this:

tar cf - dir1 | (cd dir2; tar xf -)

by this:

tar cf - dir1 | tar -xf - -C dir2

Another cool tar-related trick...

TomL's picture

Not an omission on your part, since it really qualifies as a "stupid-perl-trick"! :-)

The "find2perl" program (see links below) has a '-tar tarfile' option I did not discover until recently.

You basically get an awesome find with integrated tar, or is it tar with integrated find?

find2perl actually generates a Perl script comparable to the find command-line you pass it, so if you will be repeating the operation in the future you can save the resulting script.

For one-time use, you can do this:

TomL@xat:~$ find2perl .config/google-googletalkplugin | perl
.config/google-googletalkplugin
.config/google-googletalkplugin/gtalkplugin-c1544574457.log.bz2
.config/google-googletalkplugin/googletalkplugin_port
.config/google-googletalkplugin/options
.config/google-googletalkplugin/gtbplugin.log
.config/google-googletalkplugin/gtalkplugin-c51918583.log.bz2
.config/google-googletalkplugin/gtalkplugin-c544853250.log.bz2

To tar those files:

TomL@xat:~$ find2perl .config/google-googletalkplugin -tar tarfile.tar | perl
.config/google-googletalkplugin/gtalkplugin-c1544574457.log.bz2
.config/google-googletalkplugin/googletalkplugin_port
.config/google-googletalkplugin/options
.config/google-googletalkplugin/gtbplugin.log
.config/google-googletalkplugin/gtalkplugin-c51918583.log.bz2
.config/google-googletalkplugin/gtalkplugin-c544853250.log.bz2

TomL@xat:~$ ls -lh tarfile.tar
-rw-r--r-- 1 TomL TomL 30K 2010-08-26 13:54 tarfile.tar

TomL@xat:~$ tar tvf tarfile.tar
drwxr-xr-x TomL/TomL 0 2010-08-26 13:23 .config/google-googletalkplugin
-rw-r--r-- TomL/TomL 5036 2010-08-25 14:28 .config/google-googletalkplugin/gtalkplugin-c1544574457.log.bz2
-rw------- TomL/TomL 20 2010-08-26 13:23 .config/google-googletalkplugin/googletalkplugin_port
-rw-r--r-- TomL/TomL 107 2010-08-23 14:44 .config/google-googletalkplugin/options
-rw-r--r-- TomL/TomL 687 2010-08-26 13:23 .config/google-googletalkplugin/gtbplugin.log
-rw-r--r-- TomL/TomL 5749 2010-08-25 15:55 .config/google-googletalkplugin/gtalkplugin-c51918583.log.bz2
-rw-r--r-- TomL/TomL 5596 2010-08-25 14:30 .config/google-googletalkplugin/gtalkplugin-c544853250.log.bz2

With compression:

TomL@xat:~$ find2perl .config/google-googletalkplugin -tar - | perl | gzip > tarfile.tar.gz

TomL@xat:~$ tar ztvf tarfile.tar.gz
drwxr-xr-x TomL/TomL 0 2010-08-26 13:23 .config/google-googletalkplugin
-rw-r--r-- TomL/TomL 5036 2010-08-25 14:28 .config/google-googletalkplugin/gtalkplugin-c1544574457.log.bz2
-rw------- TomL/TomL 20 2010-08-26 13:23 .config/google-googletalkplugin/googletalkplugin_port
-rw-r--r-- TomL/TomL 107 2010-08-23 14:44 .config/google-googletalkplugin/options
-rw-r--r-- TomL/TomL 687 2010-08-26 13:23 .config/google-googletalkplugin/gtbplugin.log
-rw-r--r-- TomL/TomL 5749 2010-08-25 15:55 .config/google-googletalkplugin/gtalkplugin-c51918583.log.bz2
-rw-r--r-- TomL/TomL 5596 2010-08-25 14:30 .config/google-googletalkplugin/gtalkplugin-c544853250.log.bz2

And all of the wonderful "find" command options are available, so this is really incredibly powerful.

http://perldoc.perl.org/find2perl.html

Thanks again!

Tom

Compress across network

Bob Weber's picture

I used to use tar and ssh all the time to compress files when moving them across servers. Just add the z parameter to your network transfer example. Works great when you have really huge stuff to use and more CPU than bandwidth.

rsync

eltomo's picture

Nice deal if you don't have anything better -or to keep them old fingers in shape- but I'd really recommend rsync for live backups. It does exactly the same job (mirror remote directories/volumes/machines using a compressed ssh connection) but takes much fewer options for a standard job. It also is highly optimized for unattended execution.

BTW: there's one tar job, I am frequently using but haven't seen on this thread:
tar cf DIRECTORY|gpg -e ME@LOCALHOST>DIRECTORY-$(date +%F).tar.gpg
and vice versa
gpg -d DIRECTORY-XXXX-XX-XX.tar.gpg|tar xf -

Embed tar command in tar's label

fade-in's picture

Sometimes I like to know what command I used to create a particular archive. But after sitting on a disk for a few weeks, my memory is foggy enough that I can't recall how I created the tarfile in the first place.

This bash/zsh trick uses the !# event designator to embed the tar command into the tar file's volume label:

tar -cvf test.tar --exclude='mod.*' --exclude='*~' test --label="!#:0-"

This command archives the contents of the test/ directory, sans files match mod.* and *~. The --label argument is filled in with the current command line from the word 'tar' up to the directory name 'test'. The 0- bit after !# is a range specifier that means to take everything from word 0 (tar) up to the word previous to the event specifier itself.

In other words, the --label= part isn't included in the label itself. With the extra quote marks, things get weird if you do.

History substitution

Anonymous's picture

Hi fade-in,

I just thought I would point out that the !#:0- is a history expansion and has nothing to do with #! at the beginning of a shell script. I quote the following from the bash man page (however csh has very similar history expansion).


Event Designators
An event designator is a reference to a command line
entry in the history list.

! Start a history substitution, except when fol-
lowed by a blank, newline, carriage return, =
or ( (when the extglob shell option is enabled
using the shopt builtin).
!n Refer to command line n.
!-n Refer to the current command line minus n.
!! Refer to the previous command. This is a syn-
onym for ‘!-1’.
!string
Refer to the most recent command starting with
string.
!?string[?]
Refer to the most recent command containing
string. The trailing ? may be omitted if
string is followed immediately by a newline.
^string1^string2^
Quick substitution. Repeat the last command,
replacing string1 with string2. Equivalent to
‘‘!!:s/string1/string2/’’ (see Modifiers
below).
!# The entire command line typed so far.

Word Designators
Word designators are used to select desired words from
the event. A : separates the event specification from
the word designator. It may be omitted if the word
designator begins with a ^, $, *, -, or %. Words are
numbered from the beginning of the line, with the
first word being denoted by 0 (zero). Words are
inserted into the current line separated by single
spaces.

0 (zero)
The zeroth word. For the shell, this is the
command word.
n The nth word.
^ The first argument. That is, word 1.
$ The last argument.
% The word matched by the most recent ‘?string?’
search.
x-y A range of words; ‘-y’ abbreviates ‘0-y’.
* All of the words but the zeroth. This is a
synonym for ‘1-$’. It is not an error to use *
if there is just one word in the event; the
empty string is returned in that case.
x* Abbreviates x-$.
x- Abbreviates x-$ like x*, but omits the last
word.

If a word designator is supplied without an event
specification, the previous command is used as the
event.

Foggy

obx_ruckle's picture

I'm impressed that it takes a few weeks for your memory to get foggy. You can monitor mine with a stopwatch :o)

command-line memory

Ed Grimm's picture

Personally, I can remember how I created files for *years*.  All I do is have my history set to an obscene length, and I use a perl script to keep it from being filled with duplicate or trivial command-lines.  (I could use the zsh option hist_ignore_all_dups, except that I have multiple shells running in parallel, so I dump history with an fc -IA, and that defeats the hist_ignore_all_dups option.  I also have periodic set to do an fc -IA, so that I'm less susceptible to coworkers shutting systems down without warning me.)  Then, when I want to recall how a file was created, I just press control-R, and enter an appropriate search.  If I created the file with a redirect, it's usually pretty quick: > {filename} usually doesn't find many false-positives.  If the file's been too heavily used to find it that way, I can run 'fc -f -l 1 | less', and then can use regular expressions to search for it.

Or while using bash, you

allanf's picture

Or while using bash, you could use HISTCONTROL, HISTIGNORE, HISTORYCONTROL, HISTFILESIZE and/or HISTFILE.

Such as:
tab=$'\t'
HISTIGNORE="ls:ps:kill[ ${tab}][ ${tab}]*-9[ ${tab}].*'
HISTCONTROL=ignorespace:ignoredups
or
HISTORYCONTROL=ignoreboth

I would...

eltomo's picture

...but where did I just put the stopwatch?

stopwatch?

obx_ruckle's picture

What stopwatch?
;-)

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix