Compression Tools Compared

Use top-performing but little-known lossless data compression tools to increase your storage and bandwidth by up to 400%.

Check whether the data compression tool that you want is installed on both computers. If it's not, you can see where to get it in the on-line Resources for this article. Remember to replace a/dir in the following examples with the real path of the data to back up.

Unless your data already is in one big file, be smart and consolidate it with a tool such as tar. Aggregated data has more redundancy to winnow out, so it's ultimately more compressible.

But be aware that the redundancy that saps your performance also may make it easier to recover from corruption. If you're worried about corruption, you might consider testing for it with the cksum command or adding a limited amount of redundancy back into your compressed data with a tool such as parchive or ras.

lzop often is the fastest tool. It finishes about three times faster than gzip but still compresses data almost as much. It finishes about a hundred times faster than lzma and 7za. Furthermore, lzop occasionally decompresses data even faster than simply copying it! Use lzop on the command line as a filter with the backup tool named tar:

$ tar c a/dir | lzop - > backup.tar.lzo

tar's c option tells it to create one big archive from the files in a/dir. The | is a shell command that automatically pipes tar's output into lzop's input. The - tells lzop to read from its standard input, and the > is a shell command that redirects lzop's output to a file named backup.tar.lzo.

You can restore with:

$ lzop -dc backup.tar.lzo | tar x

The d and c options tell lzop to decompress and write to standard output, respectively. tar's x option tells it to extract the original files from the archive.

Although lzop is impressive, you can get even higher compression ratios—much higher! Here's how. Combine a little-known data compression tool named lzma with tar to increase storage space effectively by 400%. Here's how you would use it to back up:

$ tar c a/dir | lzma -x -s26 > backup.tar.lzma

lzma's -x option tells it to compress more, and its -s option tells it how big of a dictionary to use.

You can restore with:

$ cat backup.tar.lzma | lzma -d | tar x

The -d option tells lzma to decompress. You need patience to increase storage by 400%; lzma takes about 40 times as long as gzip. In other words, that one-hour gzip backup might take all day with lzma.

This version of lzma is the hardest compressor to find. Make sure you get the one that acts as a filter. See Resources for its two locations.

The data compression tool with the best trade-off between speed and compression ratio is rzip. With compression level 0, rzip finishes about 400% faster than gzip and compacts data 70% more. rzip accomplishes this feat by using more working memory. Whereas gzip uses only 32 kilobytes of working memory during compression, rzip can use up to 900 megabytes, but that's okay because memory is getting cheaper and cheaper.

Here's the big but: rzip doesn't work as a filter—yet. Unless your data already is in one file, you temporarily need some extra disk space for a tar archive. If you want a good project to work on that would shake up the Linux world, enhance rzip to work as a filter. Until then, rzip is a particularly good option for squeezing a lot of data onto CDs or DVDs, because it performs well and you can use your hard drive for the temporary tar file.

Here's how to back up with rzip:

$ tar cf dir.tar a/dir
$ rzip -0 dir.tar

The -0 option says to use compression level 0. Unless you use rzip's -k option, it automatically deletes the input file, which in this case is the tar archive. Make sure you use -k if you want to keep the original file.

rzipped tar archives can be restored with:

$ rzip -d dir.tar.rz
$ tar xf dir.tar

rzip's default compression level is another top performer. It can increase your effective disk space by 375% but in only about a fifth of the time lzma can take. Using it is almost exactly the same as the example above; simply omit compression level -0.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

How to improve backups?

Paolo Subiaco's picture

Hi. Congratulations for this very useful article... I use a script for backup made by myself which use tar +gzip... switching to tar - lsop backup time takes less than half time, increasing the backup size by about 25%.
An idea to improve speed is to replace tar with a more intelligent tool.
Infact, tar simply "cat all files to stdout" and then gzip or lsop compress this huge stream of data, but some data is already compressed (images, movies, open document files) and don't need to be recompressed!
The idea is to have an archiver (like tar) which compress each file by itself, storing the original file in case of images, movies, archives, already compressed files.
Is there any tool that can do this, and save all priviledges (owner, group, mode) associated to each file like tar does?
Thank you. Paolo

(1) Found a typo: "On the

Anonymous's picture

(1) Found a typo:
"On the other hand, if you have a 1GHz network, but only a 100MHz CPU"

1 GHz network? Should maybe be 1 Gbps.

(2) Suggestion:
Multi-Core CPUs are the big thing today, compression tools that could utilise multiple cores can run 2, 4 or soon even 8 times faster on "normal" desktop PCs...not even speaking of the servers...which compression tools can utilise this CPU power?

multi-core CPU support

zmi's picture

Multi-Core CPUs are the big thing today, compression tools that could utilise multiple cores can run 2, 4 or soon even 8 times faster on "normal" desktop PCs...not even speaking of the servers...which compression tools can utilise this CPU power?

http://compression.ca/pbzip2/
There's parallel bzip2, very good but not pipe support.

HTH,
mfg zmi

Very nice information

Bharat's picture

Very nice information provided.Thanks!!!

Excelent article.

Eduardo Diaz's picture

Thanks very much for this article. I really enjoyed it, and will be helpfull for my daily work.

1. how about another part

Anonymous's picture

how about another part with specific data - like 90+% text? for mysql dumps & dbmail scenarios etc.

and 45MB does not sound as sufficient test data size for rzip to test it's speed.

Compression on Windows

Werner Bergmans's picture

First of all excellent test!.

Believe it or not, but compression is one of those application types where all research takes place on Windows Pc's. The last couple of years there were some major breakthroughs in compression caused by the new PAQ context modeling algorithms. Have a look at this site for some results. Programs like gzip, rzip 7-zip and lzop are tested here too, so it should be easy to compare results.
http://www.maximumcompression.com/

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix