Archiving and Compression
-zvtf
Before you take apart a tarball (whether or not it was also compressed using gzip), it's a really good idea to test it. First, you'll know if the tarball is corrupted, saving yourself hair pulling when files don't seem to work. Second, you'll know if the person who created the tarball thoughtfully tarred up a directory containing 100 files, or instead thoughtlessly tarred up 100 individual files, which you're just about to spew all over your desktop.
To test your tarball (once again assuming it was also zipped using gzip), use the -t (or --list) option.
$ tar -zvtf moby.tar.gz
scott/scott 0 moby-dick/
scott/scott 102519 moby-dick/job.txt
scott/scott 0 moby-dick/bible/
scott/scott 207254 moby-dick/bible/genesis.txt
scott/scott 102519 moby-dick/bible/job.txt
scott/scott 1236574 moby-dick/moby-dick.txt
scott/scott 508925 moby-dick/paradise_lost.txt
This tells you the permissions, ownership, file size, and time for each file. In addition, because every line begins with moby-dick/, you can see that you're going to end up with a directory that contains within it all the files and subdirectories that accompany the tarball, which is a relief.
Be sure that the -f is the last option because after that you're going to specify the name of the .tar.gz file. If you don't, tar complains:
$ tar -zvft moby.tar.gz
tar: You must specify one of the '-Acdtrux' options
Try 'tar --help' or 'tar --usage' for more information.
Now that you've ensured that your .tar.gz file isn't corrupted, it's time to actually open it up, as you'll see in the following section.
Note - If you're testing a tarball that was compressed using bzip2, just use this command instead:
$ tar -jvtf moby.tar.bz2
-zxvf
To create a .tar.gz file, you used a set of options: -zcvf. To untar and uncompress the resulting file, you only make one substitution: -x (or --extract) for -c (or --create).
$ ls -l rsgranne rsgranne 846049 moby.tar.gz $ tar -zxvf moby.tar.gz moby-dick/ moby-dick/job.txt moby-dick/bible/ moby-dick/bible/genesis.txt moby-dick/bible/job.txt moby-dick/moby-dick.txt moby-dick/paradise_lost.txt $ ls -l rsgranne rsgranne 168 moby-dick rsgranne rsgranne 846049 moby.tar.gz
Make sure you always test the file before you open it, as covered in the previous section, "Test Files That Will Be Untarred and Uncompressed." That means the order of commands you should run will look like this:
$ tar -zvtf moby.tar.gz $ tar -zxvf moby.tar.gz
Note - If you're opening a tarball that was compressed using bzip2, just use this command instead:
$ tar -jxvf moby.tar.bz2
Back in the days of slow modems and tiny hard drives, archiving and compression was a necessity. These days, it's more of a convenience, but it's still something you'll find yourself using all the time. For instance, if you ever download source code to compile it, more than likely you'll find yourself face-to-face with a file such as sourcecode.tar.gz. In the future, you'll probably see more and more of those files ending with .tar.bz2. And if you exchange files with Windows users, you're going to run into files that end with .zip. Learn how to use your archival and compression tools because you're going to be using them far more than you think.
About the Author:
Scott Granneman is a monthly columnist for SecurityFocus and Linux Magazine, as well as a professional blogger on The Open Source Weblog. He is an adjunct Professor at Washington University, St. Louis and at Webster University, teaching a variety of courses about technology and the Internet.

"Linux Phrasebook" by Scott Granneman
ISBN: 0-672-32838-0
http://www.samspublishing.com/bookstore/product.asp?isbn=0672328380&rl=1
© Copyright Pearson Education. All rights reserved.
Chapter excerpt provided by Sams Publishing an imprint of Pearson Education
Reprinted with permission.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- Non-Linux FOSS: libnotify, OS X Style
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- RSS Feeds
- It is quiet helping
25 min 55 sec ago - Technology
42 min 59 sec ago - Reachli - Amplifying your
1 hour 59 min ago - excellent
2 hours 48 min ago - good point!
2 hours 51 min ago - Varnish works!
3 hours 8 sec ago - Reply to comment | Linux Journal
3 hours 29 min ago - Reply to comment | Linux Journal
5 hours 55 min ago - Reply to comment | Linux Journal
9 hours 55 min ago - Yeah, user namespaces are
11 hours 11 min ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Unzipping Password Protected Zips
You left out how to unzip ZIP files that are password protected in Linux. I'm searching for this elusive bit of information on the internet right now...
Password protectedly adding files by PHP code was not found
Password protectedly adding files by PHP code was not found on the internet when i was searching for it... so i come across your article and it gave me the idea to why not issue a system command by php to add files in zip and even protect the files by password ;)
RAR
RAR is good and free too. It supports passwords and can make SFX archives.
No mention of lzma?
How about rzip or lzma? I recall an article in the print edition within the last ten or eleven issues that compared the cpu overhead of each compression method against compression ratios (and possibly other parameters). Anyways, rzip is memory and cpu intensive, IIRC, but has the potential to make enormous savings. I think it's the same as burrows-wheeler over larger data sets, possibly. Worthwhile for stuff that won't be frequently decompressed, IMO.
rzip
actually rzip levels are in search buffer sizes:
-0 = 100MB
-1 = 100MB
-x = x00MB for x>0 and x<=9
cpu intensive? well depends. I hacked bzip2 compression hooks out of the rzip and it's one of the fastest pre archiving filters with best compression ratio for mysql dump of dbmail database.
yup found bug but only in decompression algorithm - not the data itself. yes - made Andrew to fix it.
Correction to wording
Scott,
In the section "Archive Files with tar", paragraph 3, you state that tar is "designed to compress entire directory structures". I think this should read "designed to archive...", since this section deals only with tar's standalone use as an archival tool and since this article/chapter is intended to highlight the difference between archiving and compressing. Other than that, this is a very handy primer on archiving and compressing in *nix.
bzip2 -9
The article states that the default block size for bzip2 is -6. The man page for my system (Ubuntu 6.06) states that -9 is the default, and I am unaware of any system where -6 is the default.
TROGDOR STRIKES AGAIN!
TROGDOR STRIKES AGAIN!
http://news.bbc.co.uk/1/hi/england/cornwall/6088008.stm
Making -9 the default
An easier way to default to the best (-9) compression level would be to export GZIP='-9' and ZIPOPTS='-9' into your environment.