Archiving and Compression

 in
Chapter 8 from Scott Granneman's new book "Linux Phrasebook", the pocket guide every linux user needs. Linux Phrasebook offers a concise reference that, like a language phrasebook, can be used "in the street." The book goes straight to practical Linux uses, providing immediate solutions for day-to-day tasks.
Test Files That Will Be Untarred and Uncompressed

     -zvtf

Before you take apart a tarball (whether or not it was also compressed using gzip), it's a really good idea to test it. First, you'll know if the tarball is corrupted, saving yourself hair pulling when files don't seem to work. Second, you'll know if the person who created the tarball thoughtfully tarred up a directory containing 100 files, or instead thoughtlessly tarred up 100 individual files, which you're just about to spew all over your desktop.

To test your tarball (once again assuming it was also zipped using gzip), use the -t (or --list) option.

   $ tar -zvtf moby.tar.gz
   scott/scott 0 moby-dick/
   scott/scott 102519 moby-dick/job.txt
   scott/scott 0 moby-dick/bible/
   scott/scott 207254 moby-dick/bible/genesis.txt
   scott/scott 102519 moby-dick/bible/job.txt
   scott/scott 1236574 moby-dick/moby-dick.txt
   scott/scott 508925 moby-dick/paradise_lost.txt

This tells you the permissions, ownership, file size, and time for each file. In addition, because every line begins with moby-dick/, you can see that you're going to end up with a directory that contains within it all the files and subdirectories that accompany the tarball, which is a relief.

Be sure that the -f is the last option because after that you're going to specify the name of the .tar.gz file. If you don't, tar complains:

   $ tar -zvft moby.tar.gz
   tar: You must specify one of the '-Acdtrux' options
   Try 'tar --help' or 'tar --usage' for more information.

Now that you've ensured that your .tar.gz file isn't corrupted, it's time to actually open it up, as you'll see in the following section.

Note - If you're testing a tarball that was compressed using bzip2, just use this command instead:

     $ tar -jvtf moby.tar.bz2

Untar and Uncompress Files

     -zxvf

To create a .tar.gz file, you used a set of options: -zcvf. To untar and uncompress the resulting file, you only make one substitution: -x (or --extract) for -c (or --create).

   $ ls -l
   rsgranne rsgranne 846049 moby.tar.gz
   $ tar -zxvf moby.tar.gz
   moby-dick/
   moby-dick/job.txt
   moby-dick/bible/
   moby-dick/bible/genesis.txt
   moby-dick/bible/job.txt
   moby-dick/moby-dick.txt
   moby-dick/paradise_lost.txt
   $ ls -l
   rsgranne rsgranne  168 moby-dick
   rsgranne rsgranne 846049 moby.tar.gz

Make sure you always test the file before you open it, as covered in the previous section, "Test Files That Will Be Untarred and Uncompressed." That means the order of commands you should run will look like this:

   $ tar -zvtf moby.tar.gz
   $ tar -zxvf moby.tar.gz

Note - If you're opening a tarball that was compressed using bzip2, just use this command instead:

      $ tar -jxvf moby.tar.bz2

Conclusion

Back in the days of slow modems and tiny hard drives, archiving and compression was a necessity. These days, it's more of a convenience, but it's still something you'll find yourself using all the time. For instance, if you ever download source code to compile it, more than likely you'll find yourself face-to-face with a file such as sourcecode.tar.gz. In the future, you'll probably see more and more of those files ending with .tar.bz2. And if you exchange files with Windows users, you're going to run into files that end with .zip. Learn how to use your archival and compression tools because you're going to be using them far more than you think.

About the Author:

Scott Granneman is a monthly columnist for SecurityFocus and Linux Magazine, as well as a professional blogger on The Open Source Weblog. He is an adjunct Professor at Washington University, St. Louis and at Webster University, teaching a variety of courses about technology and the Internet.


         "Linux Phrasebook" by Scott Granneman
         ISBN: 0-672-32838-0
         http://www.samspublishing.com/bookstore/product.asp?isbn=0672328380&rl=1
         © Copyright Pearson Education.  All rights reserved.
         Chapter excerpt provided by Sams Publishing an imprint of Pearson Education

         Reprinted with permission.
      

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Unzipping Password Protected Zips

Anonymous's picture

You left out how to unzip ZIP files that are password protected in Linux. I'm searching for this elusive bit of information on the internet right now...

Password protectedly adding files by PHP code was not found

Farrukh Shahzad's picture

Password protectedly adding files by PHP code was not found on the internet when i was searching for it... so i come across your article and it gave me the idea to why not issue a system command by php to add files in zip and even protect the files by password ;)

RAR

Amelia's picture

RAR is good and free too. It supports passwords and can make SFX archives.

No mention of lzma?

Brian Cain's picture

How about rzip or lzma? I recall an article in the print edition within the last ten or eleven issues that compared the cpu overhead of each compression method against compression ratios (and possibly other parameters). Anyways, rzip is memory and cpu intensive, IIRC, but has the potential to make enormous savings. I think it's the same as burrows-wheeler over larger data sets, possibly. Worthwhile for stuff that won't be frequently decompressed, IMO.

rzip

Anonymous's picture

actually rzip levels are in search buffer sizes:

-0 = 100MB
-1 = 100MB
-x = x00MB for x>0 and x<=9

cpu intensive? well depends. I hacked bzip2 compression hooks out of the rzip and it's one of the fastest pre archiving filters with best compression ratio for mysql dump of dbmail database.

yup found bug but only in decompression algorithm - not the data itself. yes - made Andrew to fix it.

Correction to wording

DAKH's picture

Scott,

In the section "Archive Files with tar", paragraph 3, you state that tar is "designed to compress entire directory structures". I think this should read "designed to archive...", since this section deals only with tar's standalone use as an archival tool and since this article/chapter is intended to highlight the difference between archiving and compressing. Other than that, this is a very handy primer on archiving and compressing in *nix.

bzip2 -9

Chris Thompson's picture

The article states that the default block size for bzip2 is -6. The man page for my system (Ubuntu 6.06) states that -9 is the default, and I am unaware of any system where -6 is the default.

TROGDOR STRIKES AGAIN!

TROGDOR's picture

Making -9 the default

Craig Buchek's picture

An easier way to default to the best (-9) compression level would be to export GZIP='-9' and ZIPOPTS='-9' into your environment.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix