Compressing Web Content with mod_gzip and mod_deflate

Compressing Web content can produce a much faster site for users. Here's how to set it up and measure your success.
Logging Compression Results

When considering the logging methods of mod_gzip and mod_deflate, there really are no comparisons. mod_gzip logging is robust and configurable and is based on the Apache log format. This allows mod_gzip logs to be configured for analysis basically any way you want. The default log formats provided when the module is installed are shown below:

LogFormat "%h %l %u %t \"%r\" %>s %b mod_gzip: \
    %{mod_gzip_compression_ratio}npct." \
    common_with_mod_gzip_info1
LogFormat "%h %l %u %t \"%r\" %>s %b mod_gzip: \
    %{mod_gzip_result}n In:%{mod_gzip_input_size}n \
    Out:%{mod_gzip_output_size}n \
    Ratio:%{mod_gzip_compression_ratio}npct." \
    common_with_mod_gzip_info2
LogFormat "%{mod_gzip_compression_ratio}npct." \
    mod_gzip_info1
LogFormat "%{mod_gzip_result}n In:%{mod_gzip_input_size}n \
    Out:%{mod_gzip_output_size}n \
    Ratio:%{mod_gzip_compression_ratio}npct." \
    mod_gzip_info2

Logging allows you to see the file's size prior to and after compression, as well as the compression ratio. After tweaking the log formats to meet your specific configuration, they can be added to a logging system by specifying a CustomLog in the httpd.conf file:

CustomLog logs/gzip.log common_with_mod_gzip_info2
CustomLog logs/gzip.log mod_gzip_info2

Logging in mod_deflate, on the other hand, is limited to one configuration directive, DeflateFilterNote, which is added to an access_log file. Be careful about doing this in your production logs, as it may cause some log analyzers to have issues when examining your files. It is best to start out by logging compression ratios to a separate file:

DeflateFilterNote ratio

LogFormat '"%r" %b (%{ratio}n) "%{User-agent}i"' \
    deflate
CustomLog logs/deflate_log deflate
Performance Improvements from Compression

How much improvement can you see with compression? The difference in measured download times on a lightly loaded server indicates the time to download the base page (the initial HTML file) improved by between 1.3 and 1.6 seconds across a slow connection.

Figure 1. Time Required to Download the HTML Page with and without Compression

The time for the server to respond to a client requesting a compressed page is slightly slower. Measurements show that the median response time for the server averaged 0.23 seconds for the uncompressed page and 0.27 seconds for the compressed page. However, most Web server administrators should be willing to accept a 0.04 increase in response time to achieve a 1.5 second improvement in file transfer time.

Web pages are not completely HTML, however. So, how do improved HTML (and CSS) download times affect overall performance? The graph below shows that overall download times for the test page were 1–1.5 seconds better when the HTML files were compressed.

Figure 2. Time to Download the Page with Images

To emphasize the value of compression further, I ran a test on a Web server to see what the average compression ratio would be when requesting a large number of files. In addition, I wanted to determine what the affect on server response time would be when requesting large numbers of compressed files simultaneously. There were 1,952 HTML files in the test directory, and I checked the results using cURL across my local LAN (Tables 3 and 4). The files I used were the top-level HTML files from the Linux Documentation Project. They were installed on an Apache 1.3.27 server running mod_gzip. Minimum file size was 80 bytes and maximum file size was 99,419 bytes.

Table 3. Large Sample of File Requests (1952 HTML Files)

 First Byte: Average/MedianTotal Time: Average/MedianBytes: Average/MedianTotal Bytes
mod_gzip    
Uncompressed0.091/0.0300.280/0.1736,349/3,75012,392,318
Compressed0.084/0.0360.128/0.0792,416/1,5434,716,160
mod_deflate[5]    
Uncompressed0.044/0.0280.241/0.1696,349/3,75012,392,318
Compressed0.046/0.0310.107/0.0502,418/1,5444,720,735

Table 4. Totals

 mod_gzipmod_deflate
Average compression0.4330.438
Median compression0.4270.427

As expected, the first byte download time was slightly higher with the compressed files than it was with the uncompressed files. But this difference was in milliseconds and is hardly worth mentioning in terms of on-the-fly compression. It is unlikely that any user, especially dial-up users, would notice this difference in performance.

That the delivered data was transformed to 43% of the original file size should make any Web administrator sit up and take notice. The compression ratio for the test files ranged from no compression for files that were less than 300 bytes to 15% of the original file size for two Linux SCSI Programming HOWTOs. Compression ratios do not increase in a linear fashion when compared to file size; rather, compression depends heavily on the repetition of content within a file to gain its greatest successes. The SCSI Programming HOWTOs have a great deal of repeated characters, making them ideal candidates for extreme compression.

Smaller files also did not compress as well as larger files, exactly for this reason. Fewer bytes means a lower probability of repeated bytes, resulting in a lower compression ratio.

Table 5. Average Compression by File Size (in Bytes)

 0–9991,000–4,9995,000–9,99910,000–19,99920,000–49,99950,000–
mod_gzip0.7130.4400.3890.3690.3500.329
mod_deflate0.7770.4400.3890.3690.3500.331

The data in Table 5 shows that compression works best on files larger than 5,000 bytes. Below that size, average compression gains are smaller, unless a file has a large number of repeated characters. Some people argue that compressing files below a certain size wastes CPU cycles. If you agree with these folks, using the 5,000 byte value as floor value for compressing files should be a good starting point. I am of the opposite mindset. I compress everything that comes off my servers because I consider myself an HTTP over-clocker, trying to squeeze every last bit of download performance out of the network.

mod_deflate does not have a low-end boundary for file size, so it attempts to compress files too small to benefit from compression. This results in files smaller than approximately 120 bytes becoming larger when processed by mod_deflate.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I had no idea you could do

David Hops's picture

I had no idea you could do this. Should come in handy as I don't have much bandwidth at the moment and am hosting a few large images and zips. Only concern would be extra CPU load.

Great article, thank you for

Webtiful Search Engine SEO's picture

Great article, thank you for sharing.

mod_deflate

Mikhailov Anatoly's picture

Article about mod_deflate settings like on Amazon EC2 AMI
http://railsgeek.com/2008/12/16/apache2-httpd-improving-performance-mod_...

Nice Work!

Fargham's picture

A great and informative article!

Really helped me!!

Ashutosh Chaturvedi's picture

Hi,

It's really a great artical and really helped me. One question from myside is..

Is there anyway of copressing .tiff files by using mod_gzip??

Plz help me, if anyone has idea about the same.

Thanks in advance..

Ashutosh

Brain Cancers

seda's picture

I read your article.The things you have written sound very sincere and nice topics i am looking forward to its continuation.

Re: Compressing Web Content

Anonymous's picture

stephen,
good article summarizing the methods and benefits of compressing web pages. however, you should touch on the difficulty of using mod_gzip with mod_ssl under apache 1.3.x -- this is a cumbersome issue and there are only workaround solutions. one such is to use a mod_proxy frontend virtual server to buffer the ssl request, and a mod_gzip backend virtual server to handle the compression. more detail on this two stage approach is here:
http://lists.over.net/pipermail/mod_gzip/2002-February/005911.html
i have implemented the above method on a few production servers and it does indeed work, with some caveats.

i believe that i read somewhere that apache 2.x had improved handling of the gzip/ssl pairing. not having played with 2.x i'm not in a position to say whether or not it actually works. perhaps someone could comment on this.

regards,
jim

At what cost to the CPU

Anonymous's picture

Great article! I enjoyed reading it and found it very informative. One question though...

What will this module do to my CPU? Will the load average on my box go through the roof everytime I need to send out a compressed webpage? I think this would have been a nice point to look at as part of your article.

Really helped me!!

Ashutosh Chaturvedi's picture

Hi,

It's really a great artical and really helped me. One question from myside is..

Is there anyway of copressing .tiff files by using mod_gzip??

Plz help me, if anyone has idea about the same.

Thanks in advance..

Ashutosh

compressing .tiff files

Marcus's picture

To compress TIFF files, simply remove the following exlusion for images from the above example configuration:

mod_gzip_item_exclude mime ^image/.*$

Also, add the .tif file extension in the file inclusions:

mod_gzip_item_include file \.(tif)$

Please let me know if you add this because I'd like to test a browser implementation against it. Thanks.

Marcus Adams
yEnc Decoder Proxy

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState