Compression Tools Compared
Data compression works so well that popular backup and networking tools have some built in. Linux offers more than a dozen compression tools to choose from, and most of them let you pick a compression level too. To find out which perform best, I benchmarked 87 combinations of tools and levels. Read this article to learn which compressor is a hundred times faster than the others and which ones compress the most.
The most popular data compression tool for Linux is gzip, which lets you choose a compression level from one to nine. One is fast, and nine compresses well. Choosing a good trade-off between speed and compression ratio becomes important when it takes hours to handle gigabytes of data. You can get a sense of what your choices are from the graph shown in Figure 1. The fastest choices are on the left, and the highest compressing ones are on the top. The best all-around performers are presented in the graph's upper left-hand corner.
But many other data compression tools are available to choose from in Linux. See the comprehensive compression and decompression benchmarks in Figures 2 and 3. As with gzip, the best performers are in the upper left-hand corner, but these charts' time axes are scaled logarithmically to accommodate huge differences in how fast they work.
How compactly data can be compressed depends on what type of data it is. Don't expect big performance increases from data that's already compressed, such as files in Ogg Vorbis, MP3 or JPEG format. On the other hand, I've seen data that allows performance increases of 1,000%!
All benchmarks in this article used the same 45MB of typical Linux data, containing:
24% ELF 32-bit LSB
15% ASCII C program
11% gzip compressed data
8% ASCII English text
7% binary package
2% current ar archive
2% Texinfo source text
2% PostScript document text
2% Bourne shell script
2% ASCII text
21% various other data types
This data set was chosen because it is more representative of the demands made on today's Linux systems than the data used in the traditional Canterbury and Calgary test data, because this data set is bigger and contains Linux binaries.
I used the same lightly loaded AMD Athlon XP 1700+ CPU with 1GB of RAM and version 2.4.27-1-k7 of the Linux kernel for all tests. Unpredictable disk drive delays were minimized by pre-loading data into RAM. Elapsed times were measured in thousandths of a second. I'm not affiliated with any of the tools, and I strove to be objective and accurate.
The tools that tend to compress more and faster are singled out in the graphs shown in Figures 4 and 5. Use these for backups to disk drives. Remember, their time axes are scaled logarithmically. The red lines show the top-performing ones, and the green lines show the top performers that also can act as filters.
Webinar: 8 Signs You’re Beyond Cron
On Demand NOW
|Mumblehard--Let's End Its Five-Year Reign||May 04, 2015|
|An Easy Way to Pay for Journalism, Music and Everything Else We Like||May 04, 2015|
|When Official Debian Support Ends, Who Will Save You?||May 01, 2015|
|May 2015 Issue of Linux Journal: Cool Projects||May 01, 2015|
|May 2015 Video Preview||May 01, 2015|
|Ubuntu Ditches Upstart||Apr 30, 2015|
- Mumblehard--Let's End Its Five-Year Reign
- An Easy Way to Pay for Journalism, Music and Everything Else We Like
- When Official Debian Support Ends, Who Will Save You?
- Ubuntu Ditches Upstart
- "No Reboot" Kernel Patching - And Why You Should Care
- Return of the Mac
- DevOps: Better Than the Sum of Its Parts
- Picking Out the Nouns
- Video On Demand: 8 Signs You're Beyond Cron
- May 2015 Issue of Linux Journal: Cool Projects