Archiving and Compression

 in
Chapter 8 from Scott Granneman's new book "Linux Phrasebook", the pocket guide every linux user needs. Linux Phrasebook offers a concise reference that, like a language phrasebook, can be used "in the street." The book goes straight to practical Linux uses, providing immediate solutions for day-to-day tasks.
Test Files That Will Be Untarred and Uncompressed

     -zvtf

Before you take apart a tarball (whether or not it was also compressed using gzip), it's a really good idea to test it. First, you'll know if the tarball is corrupted, saving yourself hair pulling when files don't seem to work. Second, you'll know if the person who created the tarball thoughtfully tarred up a directory containing 100 files, or instead thoughtlessly tarred up 100 individual files, which you're just about to spew all over your desktop.

To test your tarball (once again assuming it was also zipped using gzip), use the -t (or --list) option.

   $ tar -zvtf moby.tar.gz
   scott/scott 0 moby-dick/
   scott/scott 102519 moby-dick/job.txt
   scott/scott 0 moby-dick/bible/
   scott/scott 207254 moby-dick/bible/genesis.txt
   scott/scott 102519 moby-dick/bible/job.txt
   scott/scott 1236574 moby-dick/moby-dick.txt
   scott/scott 508925 moby-dick/paradise_lost.txt

This tells you the permissions, ownership, file size, and time for each file. In addition, because every line begins with moby-dick/, you can see that you're going to end up with a directory that contains within it all the files and subdirectories that accompany the tarball, which is a relief.

Be sure that the -f is the last option because after that you're going to specify the name of the .tar.gz file. If you don't, tar complains:

   $ tar -zvft moby.tar.gz
   tar: You must specify one of the '-Acdtrux' options
   Try 'tar --help' or 'tar --usage' for more information.

Now that you've ensured that your .tar.gz file isn't corrupted, it's time to actually open it up, as you'll see in the following section.

Note - If you're testing a tarball that was compressed using bzip2, just use this command instead:

     $ tar -jvtf moby.tar.bz2

Untar and Uncompress Files

     -zxvf

To create a .tar.gz file, you used a set of options: -zcvf. To untar and uncompress the resulting file, you only make one substitution: -x (or --extract) for -c (or --create).

   $ ls -l
   rsgranne rsgranne 846049 moby.tar.gz
   $ tar -zxvf moby.tar.gz
   moby-dick/
   moby-dick/job.txt
   moby-dick/bible/
   moby-dick/bible/genesis.txt
   moby-dick/bible/job.txt
   moby-dick/moby-dick.txt
   moby-dick/paradise_lost.txt
   $ ls -l
   rsgranne rsgranne  168 moby-dick
   rsgranne rsgranne 846049 moby.tar.gz

Make sure you always test the file before you open it, as covered in the previous section, "Test Files That Will Be Untarred and Uncompressed." That means the order of commands you should run will look like this:

   $ tar -zvtf moby.tar.gz
   $ tar -zxvf moby.tar.gz

Note - If you're opening a tarball that was compressed using bzip2, just use this command instead:

      $ tar -jxvf moby.tar.bz2

Conclusion

Back in the days of slow modems and tiny hard drives, archiving and compression was a necessity. These days, it's more of a convenience, but it's still something you'll find yourself using all the time. For instance, if you ever download source code to compile it, more than likely you'll find yourself face-to-face with a file such as sourcecode.tar.gz. In the future, you'll probably see more and more of those files ending with .tar.bz2. And if you exchange files with Windows users, you're going to run into files that end with .zip. Learn how to use your archival and compression tools because you're going to be using them far more than you think.

About the Author:

Scott Granneman is a monthly columnist for SecurityFocus and Linux Magazine, as well as a professional blogger on The Open Source Weblog. He is an adjunct Professor at Washington University, St. Louis and at Webster University, teaching a variety of courses about technology and the Internet.


         "Linux Phrasebook" by Scott Granneman
         ISBN: 0-672-32838-0
         http://www.samspublishing.com/bookstore/product.asp?isbn=0672328380&rl=1
         © Copyright Pearson Education.  All rights reserved.
         Chapter excerpt provided by Sams Publishing an imprint of Pearson Education

         Reprinted with permission.
      

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Unzipping Password Protected Zips

Anonymous's picture

You left out how to unzip ZIP files that are password protected in Linux. I'm searching for this elusive bit of information on the internet right now...

Password protectedly adding files by PHP code was not found

Farrukh Shahzad's picture

Password protectedly adding files by PHP code was not found on the internet when i was searching for it... so i come across your article and it gave me the idea to why not issue a system command by php to add files in zip and even protect the files by password ;)

RAR

Amelia's picture

RAR is good and free too. It supports passwords and can make SFX archives.

No mention of lzma?

Brian Cain's picture

How about rzip or lzma? I recall an article in the print edition within the last ten or eleven issues that compared the cpu overhead of each compression method against compression ratios (and possibly other parameters). Anyways, rzip is memory and cpu intensive, IIRC, but has the potential to make enormous savings. I think it's the same as burrows-wheeler over larger data sets, possibly. Worthwhile for stuff that won't be frequently decompressed, IMO.

rzip

Anonymous's picture

actually rzip levels are in search buffer sizes:

-0 = 100MB
-1 = 100MB
-x = x00MB for x>0 and x<=9

cpu intensive? well depends. I hacked bzip2 compression hooks out of the rzip and it's one of the fastest pre archiving filters with best compression ratio for mysql dump of dbmail database.

yup found bug but only in decompression algorithm - not the data itself. yes - made Andrew to fix it.

Correction to wording

DAKH's picture

Scott,

In the section "Archive Files with tar", paragraph 3, you state that tar is "designed to compress entire directory structures". I think this should read "designed to archive...", since this section deals only with tar's standalone use as an archival tool and since this article/chapter is intended to highlight the difference between archiving and compressing. Other than that, this is a very handy primer on archiving and compressing in *nix.

bzip2 -9

Chris Thompson's picture

The article states that the default block size for bzip2 is -6. The man page for my system (Ubuntu 6.06) states that -9 is the default, and I am unaware of any system where -6 is the default.

TROGDOR STRIKES AGAIN!

TROGDOR's picture

Making -9 the default

Craig Buchek's picture

An easier way to default to the best (-9) compression level would be to export GZIP='-9' and ZIPOPTS='-9' into your environment.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState