Archiving and Compression
unzip
Expanding a Zip archive isn't hard at all. To create a zipped archive, use the zip command; to expand that archive, use the unzip command.
$ unzip moby.zip
Archive: moby.zip
inflating: job.txt
inflating: moby-dick.txt
inflating: paradise_lost.txt
The unzip command helpfully tells you what it's doing as it works. To get even more information, add the -v option (which stands, of course, for verbose).
unzip -v moby.zip
Archive: moby.zip
Length Method Size Ratio CRC-32 Name
------- ------ ------ ----- ------ ----
102519 Defl:X 35747 65% fabf86c9 job.txt
1236574 Defl:X 487553 61% 34a8cc3a moby-dick.txt
508925 Defl:X 224004 56% 6abe1d0f paradise_lost.t
------- ------ --- -------
1848018 747304 60% 3 files
There's quite a bit of useful data here, including the method used to compress the files, the ratio of original to compressed file size, and the cyclic redundancy check (CRC) used for error correction.
-l
Sometimes you might find yourself looking at a Zip file and not remembering what's in that file. Or perhaps you want to make sure that a file you need is contained within that Zip file. To list the contents of a zip file without unzipping it, use the -l option (which stands for "list").
$ unzip -l moby.zip
Archive: moby.zip
Length Date Time Name
-------- ---- ---- ----
0 01-26-06 18:40 bible/
207254 01-26-06 18:40 bible/genesis.txt
102519 01-26-06 18:19 bible/job.txt
1236574 01-26-06 18:19 moby-dick.txt
508925 01-26-06 18:19 paradise_lost.txt
-------- -------
2055272 5 files
From these results, you can see that moby.zip contains two files — moby-dick.txt and paradise_lost.txt — and a directory (bible), which itself contains two files, genesis. txt and job.txt. Now you know exactly what will happen when you expand moby.zip. Using the -l command helps prevent inadvertently unzipping a file that spews out 100 files instead of unzipping a directory that contains 100 files. The first leaves you with files strewn pell-mell, while the second is far easier to handle.
-t
Sometimes zipped archives become corrupted. The worst time to discover this is after you've unzipped the archive and deleted it, only to discover that some or even all of the unzipped contents are damaged and won't open. Better to test the archive first before you actually unzip it by using the -t (for test) option.
$ unzip -t moby.zip
Archive: moby.zip
testing: bible/ OK
testing: bible/genesis.txt OK
testing: bible/job.txt OK
testing: moby-dick.txt OK
testing: paradise_lost.txt OK
No errors detected in compressed data of moby.zip.
You really should use -t every time you work with a zipped file. It's the smart thing to do, and although it might take some extra time, it's worth it in the end.
gzip
Using gzip is a bit easier than zip in some ways. With zip, you need to specify the name of the newly created Zip file or zip won't work; with gzip, though, you can just type the command and the name of the file you want to compress.
$ ls -l -rw-r--r-- scott scott 508925 paradise_lost.txt $ gzip paradise_lost.txt $ ls -l -rw-r--r-- scott scott 224425 paradise_lost.txt.gz
You should be aware of a very big difference between zip and gzip: When you zip a file, zip leaves the original behind so you have both the original and the newly zipped file, but when you gzip a file, you're left with only the new gzipped file. The original is gone.
If you want gzip to leave behind the original file, you need to use the -c (or --stdout or --to-stdout) option, which outputs the results of gzip to the shell, but you need to redirect that output to another file. If you use -c and forget to redirect your output, you get nonsense like this:

Not good. Instead, output to a file.
$ls -l -rw-r--r-- 1 scott scott 508925 paradise_lost.txt $ gzip -c paradise_lost.txt > paradise_lost.txt.gz $ ls -l -rw-r--r-- 1 scott scott 497K paradise_lost.txt -rw-r--r-- 1 scott scott 220K paradise_lost.txt.gz
Much better! Now you have both your original file and the zipped version.
Tip: If you accidentally use the -c option without specifying an output file, just start pressing Ctrl+C several times until gzip stops.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- New Products
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Build a Skype Server for Your Home Phone System
- Why Python?
- Tech Tip: Really Simple HTTP Server with Python
- A Topic for Discussion - Open Source Feature-Richness?
- Not free anymore
1 hour 4 min ago - Great
4 hours 52 min ago - Reply to comment | Linux Journal
5 hours 1 sec ago - Understanding the Linux Kernel
7 hours 14 min ago - General
9 hours 44 min ago - Kernel Problem
19 hours 47 min ago - BASH script to log IPs on public web server
1 day 14 min ago - DynDNS
1 day 3 hours ago - Reply to comment | Linux Journal
1 day 4 hours ago - All the articles you talked
1 day 6 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Unzipping Password Protected Zips
You left out how to unzip ZIP files that are password protected in Linux. I'm searching for this elusive bit of information on the internet right now...
Password protectedly adding files by PHP code was not found
Password protectedly adding files by PHP code was not found on the internet when i was searching for it... so i come across your article and it gave me the idea to why not issue a system command by php to add files in zip and even protect the files by password ;)
RAR
RAR is good and free too. It supports passwords and can make SFX archives.
No mention of lzma?
How about rzip or lzma? I recall an article in the print edition within the last ten or eleven issues that compared the cpu overhead of each compression method against compression ratios (and possibly other parameters). Anyways, rzip is memory and cpu intensive, IIRC, but has the potential to make enormous savings. I think it's the same as burrows-wheeler over larger data sets, possibly. Worthwhile for stuff that won't be frequently decompressed, IMO.
rzip
actually rzip levels are in search buffer sizes:
-0 = 100MB
-1 = 100MB
-x = x00MB for x>0 and x<=9
cpu intensive? well depends. I hacked bzip2 compression hooks out of the rzip and it's one of the fastest pre archiving filters with best compression ratio for mysql dump of dbmail database.
yup found bug but only in decompression algorithm - not the data itself. yes - made Andrew to fix it.
Correction to wording
Scott,
In the section "Archive Files with tar", paragraph 3, you state that tar is "designed to compress entire directory structures". I think this should read "designed to archive...", since this section deals only with tar's standalone use as an archival tool and since this article/chapter is intended to highlight the difference between archiving and compressing. Other than that, this is a very handy primer on archiving and compressing in *nix.
bzip2 -9
The article states that the default block size for bzip2 is -6. The man page for my system (Ubuntu 6.06) states that -9 is the default, and I am unaware of any system where -6 is the default.
TROGDOR STRIKES AGAIN!
TROGDOR STRIKES AGAIN!
http://news.bbc.co.uk/1/hi/england/cornwall/6088008.stm
Making -9 the default
An easier way to default to the best (-9) compression level would be to export GZIP='-9' and ZIPOPTS='-9' into your environment.