Archiving and Compression

Chapter 8 from Scott Granneman's new book "Linux Phrasebook", the pocket guide every linux user needs. Linux Phrasebook offers a concise reference that, like a language phrasebook, can be used "in the street." The book goes straight to practical Linux uses, providing immediate solutions for day-to-day tasks.
Test Files That Will Be Untarred and Uncompressed


Before you take apart a tarball (whether or not it was also compressed using gzip), it's a really good idea to test it. First, you'll know if the tarball is corrupted, saving yourself hair pulling when files don't seem to work. Second, you'll know if the person who created the tarball thoughtfully tarred up a directory containing 100 files, or instead thoughtlessly tarred up 100 individual files, which you're just about to spew all over your desktop.

To test your tarball (once again assuming it was also zipped using gzip), use the -t (or --list) option.

   $ tar -zvtf moby.tar.gz
   scott/scott 0 moby-dick/
   scott/scott 102519 moby-dick/job.txt
   scott/scott 0 moby-dick/bible/
   scott/scott 207254 moby-dick/bible/genesis.txt
   scott/scott 102519 moby-dick/bible/job.txt
   scott/scott 1236574 moby-dick/moby-dick.txt
   scott/scott 508925 moby-dick/paradise_lost.txt

This tells you the permissions, ownership, file size, and time for each file. In addition, because every line begins with moby-dick/, you can see that you're going to end up with a directory that contains within it all the files and subdirectories that accompany the tarball, which is a relief.

Be sure that the -f is the last option because after that you're going to specify the name of the .tar.gz file. If you don't, tar complains:

   $ tar -zvft moby.tar.gz
   tar: You must specify one of the '-Acdtrux' options
   Try 'tar --help' or 'tar --usage' for more information.

Now that you've ensured that your .tar.gz file isn't corrupted, it's time to actually open it up, as you'll see in the following section.

Note - If you're testing a tarball that was compressed using bzip2, just use this command instead:

     $ tar -jvtf moby.tar.bz2

Untar and Uncompress Files


To create a .tar.gz file, you used a set of options: -zcvf. To untar and uncompress the resulting file, you only make one substitution: -x (or --extract) for -c (or --create).

   $ ls -l
   rsgranne rsgranne 846049 moby.tar.gz
   $ tar -zxvf moby.tar.gz
   $ ls -l
   rsgranne rsgranne  168 moby-dick
   rsgranne rsgranne 846049 moby.tar.gz

Make sure you always test the file before you open it, as covered in the previous section, "Test Files That Will Be Untarred and Uncompressed." That means the order of commands you should run will look like this:

   $ tar -zvtf moby.tar.gz
   $ tar -zxvf moby.tar.gz

Note - If you're opening a tarball that was compressed using bzip2, just use this command instead:

      $ tar -jxvf moby.tar.bz2


Back in the days of slow modems and tiny hard drives, archiving and compression was a necessity. These days, it's more of a convenience, but it's still something you'll find yourself using all the time. For instance, if you ever download source code to compile it, more than likely you'll find yourself face-to-face with a file such as sourcecode.tar.gz. In the future, you'll probably see more and more of those files ending with .tar.bz2. And if you exchange files with Windows users, you're going to run into files that end with .zip. Learn how to use your archival and compression tools because you're going to be using them far more than you think.

About the Author:

Scott Granneman is a monthly columnist for SecurityFocus and Linux Magazine, as well as a professional blogger on The Open Source Weblog. He is an adjunct Professor at Washington University, St. Louis and at Webster University, teaching a variety of courses about technology and the Internet.

         "Linux Phrasebook" by Scott Granneman
         ISBN: 0-672-32838-0
         © Copyright Pearson Education.  All rights reserved.
         Chapter excerpt provided by Sams Publishing an imprint of Pearson Education

         Reprinted with permission.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Unzipping Password Protected Zips

Anonymous's picture

You left out how to unzip ZIP files that are password protected in Linux. I'm searching for this elusive bit of information on the internet right now...

Password protectedly adding files by PHP code was not found

Farrukh Shahzad's picture

Password protectedly adding files by PHP code was not found on the internet when i was searching for it... so i come across your article and it gave me the idea to why not issue a system command by php to add files in zip and even protect the files by password ;)


Amelia's picture

RAR is good and free too. It supports passwords and can make SFX archives.

No mention of lzma?

Brian Cain's picture

How about rzip or lzma? I recall an article in the print edition within the last ten or eleven issues that compared the cpu overhead of each compression method against compression ratios (and possibly other parameters). Anyways, rzip is memory and cpu intensive, IIRC, but has the potential to make enormous savings. I think it's the same as burrows-wheeler over larger data sets, possibly. Worthwhile for stuff that won't be frequently decompressed, IMO.


Anonymous's picture

actually rzip levels are in search buffer sizes:

-0 = 100MB
-1 = 100MB
-x = x00MB for x>0 and x<=9

cpu intensive? well depends. I hacked bzip2 compression hooks out of the rzip and it's one of the fastest pre archiving filters with best compression ratio for mysql dump of dbmail database.

yup found bug but only in decompression algorithm - not the data itself. yes - made Andrew to fix it.

Correction to wording

DAKH's picture


In the section "Archive Files with tar", paragraph 3, you state that tar is "designed to compress entire directory structures". I think this should read "designed to archive...", since this section deals only with tar's standalone use as an archival tool and since this article/chapter is intended to highlight the difference between archiving and compressing. Other than that, this is a very handy primer on archiving and compressing in *nix.

bzip2 -9

Chris Thompson's picture

The article states that the default block size for bzip2 is -6. The man page for my system (Ubuntu 6.06) states that -9 is the default, and I am unaware of any system where -6 is the default.


TROGDOR's picture

Making -9 the default

Craig Buchek's picture

An easier way to default to the best (-9) compression level would be to export GZIP='-9' and ZIPOPTS='-9' into your environment.