Compression Tools Compared
Data compression works so well that popular backup and networking tools have some built in. Linux offers more than a dozen compression tools to choose from, and most of them let you pick a compression level too. To find out which perform best, I benchmarked 87 combinations of tools and levels. Read this article to learn which compressor is a hundred times faster than the others and which ones compress the most.
The most popular data compression tool for Linux is gzip, which lets you choose a compression level from one to nine. One is fast, and nine compresses well. Choosing a good trade-off between speed and compression ratio becomes important when it takes hours to handle gigabytes of data. You can get a sense of what your choices are from the graph shown in Figure 1. The fastest choices are on the left, and the highest compressing ones are on the top. The best all-around performers are presented in the graph's upper left-hand corner.
But many other data compression tools are available to choose from in Linux. See the comprehensive compression and decompression benchmarks in Figures 2 and 3. As with gzip, the best performers are in the upper left-hand corner, but these charts' time axes are scaled logarithmically to accommodate huge differences in how fast they work.
How compactly data can be compressed depends on what type of data it is. Don't expect big performance increases from data that's already compressed, such as files in Ogg Vorbis, MP3 or JPEG format. On the other hand, I've seen data that allows performance increases of 1,000%!
All benchmarks in this article used the same 45MB of typical Linux data, containing:
24% ELF 32-bit LSB
15% ASCII C program
11% gzip compressed data
8% ASCII English text
7% binary package
2% current ar archive
2% Texinfo source text
2% PostScript document text
2% Bourne shell script
2% ASCII text
21% various other data types
This data set was chosen because it is more representative of the demands made on today's Linux systems than the data used in the traditional Canterbury and Calgary test data, because this data set is bigger and contains Linux binaries.
I used the same lightly loaded AMD Athlon XP 1700+ CPU with 1GB of RAM and version 2.4.27-1-k7 of the Linux kernel for all tests. Unpredictable disk drive delays were minimized by pre-loading data into RAM. Elapsed times were measured in thousandths of a second. I'm not affiliated with any of the tools, and I strove to be objective and accurate.
The tools that tend to compress more and faster are singled out in the graphs shown in Figures 4 and 5. Use these for backups to disk drives. Remember, their time axes are scaled logarithmically. The red lines show the top-performing ones, and the green lines show the top performers that also can act as filters.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Sony Settles in Linux Battle
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Profiles and RC Files
- Maru OS Brings Debian to Your Phone
- Snappy Moves to New Platforms
- Understanding Ceph and Its Place in the Market
- What's Our Next Fight?
- Git 2.9 Released
- The Giant Zero, Part 0.x
- Astronomy for KDE
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide