Compression Tools Compared

Use top-performing but little-known lossless data compression tools to increase your storage and bandwidth by up to 400%.
Better Bandwidth

Data compression also can speed up network transfers. How much depends on how fast your CPU and network are. Slow networks with fast CPUs can be sped up the most by thoroughly compressing the data. Alternatively, slow CPUs with fast connections do best with no compression.

Find the best compressor and compression level for your hardware in the graph shown in Figure 6. This graph's CPU and network speed axes are scaled logarithmically too. Look where your CPU and network speeds intersect in the graph, and try the data compression tool and compression level at that point. It also should give you a sense of how much your bandwidth may increase.

Figure 6. Best Compressors for Improving the Bandwidth of Various Hardware

For example, if you have a 56Kbps dial-up modem and a 3GHz CPU, their speeds intersect in the light-yellow region labeled lzma 26 at the top of the graph. This corresponds to using lzma with a 226 size dictionary. The graph predicts a 430% increase in effective bandwidth.

On the other hand, if you have a 1GHz network, but only a 100MHz CPU, it should be faster simply to send the raw uncompressed data. This is depicted in the flat black region at the bottom of the graph.

Don't assume that you always should increase performance the most by using lzma, however. The best compression tool for data transfers depends on the ratio of your particular CPU's speed to your particular network's speed.

If the sending and receiving computers have different CPU speeds, try looking up the sending computer's speed in the graph. Compression can be much more CPU-intensive. Check whether the data compression tool and scp are installed on both computers. Remember to replace user@box.com and file with the real names.

For the fastest CPUs and/or slowest network connections that fall in the graph's light-yellow region, speed up your network transfers like this:


$ cat file \
| lzma -x -s26 \
| ssh user@box.com "lzma -d > file"

ssh stands for secure shell. It's a safe way to execute commands on remote computers. This may speed up your network transfer by more than 400%.

For fast CPUs and/or slow networks that fall into the graph's dark-yellow zone, use rzip with a compression level of one. Because rzip doesn't work as a filter, you need temporary space for the compressed file on the originating box:

$ rzip -1 -k file
$ scp file.rz user@box.com:
$ ssh user@box.com "rzip -d file.rz"

The -1 tells rzip to use compression level 1, and the -k tells it to keep its input file. Remember to use a : at the end of the scp command.

rzipped network transfers can be 375% faster. That one-hour transfer might finish in only 16 minutes!

For slightly slower CPUs and/or faster networks that fall in the graph's orange region, try using gzip with compression level 1. Here's how:

$ gzip -1c file | ssh user@box.com "gzip -d > file"

It might double your effective bandwidth. -1c tells gzip to use compression level 1 and write to standard output, and -d tells it to decompress.

For fast network connections and slow CPUs falling in the graph's blue region, quickly compress a little with lzop at compression level 1:

$ lzop -1c file | ssh user@box.com "lzop -d > file"

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

How to improve backups?

Paolo Subiaco's picture

Hi. Congratulations for this very useful article... I use a script for backup made by myself which use tar +gzip... switching to tar - lsop backup time takes less than half time, increasing the backup size by about 25%.
An idea to improve speed is to replace tar with a more intelligent tool.
Infact, tar simply "cat all files to stdout" and then gzip or lsop compress this huge stream of data, but some data is already compressed (images, movies, open document files) and don't need to be recompressed!
The idea is to have an archiver (like tar) which compress each file by itself, storing the original file in case of images, movies, archives, already compressed files.
Is there any tool that can do this, and save all priviledges (owner, group, mode) associated to each file like tar does?
Thank you. Paolo

(1) Found a typo: "On the

Anonymous's picture

(1) Found a typo:
"On the other hand, if you have a 1GHz network, but only a 100MHz CPU"

1 GHz network? Should maybe be 1 Gbps.

(2) Suggestion:
Multi-Core CPUs are the big thing today, compression tools that could utilise multiple cores can run 2, 4 or soon even 8 times faster on "normal" desktop PCs...not even speaking of the servers...which compression tools can utilise this CPU power?

multi-core CPU support

zmi's picture

Multi-Core CPUs are the big thing today, compression tools that could utilise multiple cores can run 2, 4 or soon even 8 times faster on "normal" desktop PCs...not even speaking of the servers...which compression tools can utilise this CPU power?

http://compression.ca/pbzip2/
There's parallel bzip2, very good but not pipe support.

HTH,
mfg zmi

Very nice information

Bharat's picture

Very nice information provided.Thanks!!!

Excelent article.

Eduardo Diaz's picture

Thanks very much for this article. I really enjoyed it, and will be helpfull for my daily work.

1. how about another part

Anonymous's picture

how about another part with specific data - like 90+% text? for mysql dumps & dbmail scenarios etc.

and 45MB does not sound as sufficient test data size for rzip to test it's speed.

Compression on Windows

Werner Bergmans's picture

First of all excellent test!.

Believe it or not, but compression is one of those application types where all research takes place on Windows Pc's. The last couple of years there were some major breakthroughs in compression caused by the new PAQ context modeling algorithms. Have a look at this site for some results. Programs like gzip, rzip 7-zip and lzop are tested here too, so it should be easy to compare results.
http://www.maximumcompression.com/

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState