Archiving and Compression
-zvtf
Before you take apart a tarball (whether or not it was also compressed using gzip), it's a really good idea to test it. First, you'll know if the tarball is corrupted, saving yourself hair pulling when files don't seem to work. Second, you'll know if the person who created the tarball thoughtfully tarred up a directory containing 100 files, or instead thoughtlessly tarred up 100 individual files, which you're just about to spew all over your desktop.
To test your tarball (once again assuming it was also zipped using gzip), use the -t (or --list) option.
$ tar -zvtf moby.tar.gz
scott/scott 0 moby-dick/
scott/scott 102519 moby-dick/job.txt
scott/scott 0 moby-dick/bible/
scott/scott 207254 moby-dick/bible/genesis.txt
scott/scott 102519 moby-dick/bible/job.txt
scott/scott 1236574 moby-dick/moby-dick.txt
scott/scott 508925 moby-dick/paradise_lost.txt
This tells you the permissions, ownership, file size, and time for each file. In addition, because every line begins with moby-dick/, you can see that you're going to end up with a directory that contains within it all the files and subdirectories that accompany the tarball, which is a relief.
Be sure that the -f is the last option because after that you're going to specify the name of the .tar.gz file. If you don't, tar complains:
$ tar -zvft moby.tar.gz
tar: You must specify one of the '-Acdtrux' options
Try 'tar --help' or 'tar --usage' for more information.
Now that you've ensured that your .tar.gz file isn't corrupted, it's time to actually open it up, as you'll see in the following section.
Note - If you're testing a tarball that was compressed using bzip2, just use this command instead:
$ tar -jvtf moby.tar.bz2
-zxvf
To create a .tar.gz file, you used a set of options: -zcvf. To untar and uncompress the resulting file, you only make one substitution: -x (or --extract) for -c (or --create).
$ ls -l rsgranne rsgranne 846049 moby.tar.gz $ tar -zxvf moby.tar.gz moby-dick/ moby-dick/job.txt moby-dick/bible/ moby-dick/bible/genesis.txt moby-dick/bible/job.txt moby-dick/moby-dick.txt moby-dick/paradise_lost.txt $ ls -l rsgranne rsgranne 168 moby-dick rsgranne rsgranne 846049 moby.tar.gz
Make sure you always test the file before you open it, as covered in the previous section, "Test Files That Will Be Untarred and Uncompressed." That means the order of commands you should run will look like this:
$ tar -zvtf moby.tar.gz $ tar -zxvf moby.tar.gz
Note - If you're opening a tarball that was compressed using bzip2, just use this command instead:
$ tar -jxvf moby.tar.bz2
Back in the days of slow modems and tiny hard drives, archiving and compression was a necessity. These days, it's more of a convenience, but it's still something you'll find yourself using all the time. For instance, if you ever download source code to compile it, more than likely you'll find yourself face-to-face with a file such as sourcecode.tar.gz. In the future, you'll probably see more and more of those files ending with .tar.bz2. And if you exchange files with Windows users, you're going to run into files that end with .zip. Learn how to use your archival and compression tools because you're going to be using them far more than you think.
About the Author:
Scott Granneman is a monthly columnist for SecurityFocus and Linux Magazine, as well as a professional blogger on The Open Source Weblog. He is an adjunct Professor at Washington University, St. Louis and at Webster University, teaching a variety of courses about technology and the Internet.

"Linux Phrasebook" by Scott Granneman
ISBN: 0-672-32838-0
http://www.samspublishing.com/bookstore/product.asp?isbn=0672328380&rl=1
© Copyright Pearson Education. All rights reserved.
Chapter excerpt provided by Sams Publishing an imprint of Pearson Education
Reprinted with permission.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- New Products
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Evernote is much more...
1 hour 17 min ago - Reply to comment | Linux Journal
10 hours 2 min ago - Dynamic DNS
10 hours 36 min ago - Reply to comment | Linux Journal
11 hours 35 min ago - Reply to comment | Linux Journal
12 hours 25 min ago - Not free anymore
16 hours 27 min ago - Great
20 hours 14 min ago - Reply to comment | Linux Journal
20 hours 22 min ago - Understanding the Linux Kernel
22 hours 37 min ago - General
1 day 1 hour ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Unzipping Password Protected Zips
You left out how to unzip ZIP files that are password protected in Linux. I'm searching for this elusive bit of information on the internet right now...
Password protectedly adding files by PHP code was not found
Password protectedly adding files by PHP code was not found on the internet when i was searching for it... so i come across your article and it gave me the idea to why not issue a system command by php to add files in zip and even protect the files by password ;)
RAR
RAR is good and free too. It supports passwords and can make SFX archives.
No mention of lzma?
How about rzip or lzma? I recall an article in the print edition within the last ten or eleven issues that compared the cpu overhead of each compression method against compression ratios (and possibly other parameters). Anyways, rzip is memory and cpu intensive, IIRC, but has the potential to make enormous savings. I think it's the same as burrows-wheeler over larger data sets, possibly. Worthwhile for stuff that won't be frequently decompressed, IMO.
rzip
actually rzip levels are in search buffer sizes:
-0 = 100MB
-1 = 100MB
-x = x00MB for x>0 and x<=9
cpu intensive? well depends. I hacked bzip2 compression hooks out of the rzip and it's one of the fastest pre archiving filters with best compression ratio for mysql dump of dbmail database.
yup found bug but only in decompression algorithm - not the data itself. yes - made Andrew to fix it.
Correction to wording
Scott,
In the section "Archive Files with tar", paragraph 3, you state that tar is "designed to compress entire directory structures". I think this should read "designed to archive...", since this section deals only with tar's standalone use as an archival tool and since this article/chapter is intended to highlight the difference between archiving and compressing. Other than that, this is a very handy primer on archiving and compressing in *nix.
bzip2 -9
The article states that the default block size for bzip2 is -6. The man page for my system (Ubuntu 6.06) states that -9 is the default, and I am unaware of any system where -6 is the default.
TROGDOR STRIKES AGAIN!
TROGDOR STRIKES AGAIN!
http://news.bbc.co.uk/1/hi/england/cornwall/6088008.stm
Making -9 the default
An easier way to default to the best (-9) compression level would be to export GZIP='-9' and ZIPOPTS='-9' into your environment.