Compressing Web Content with mod_gzip and mod_deflate

Compressing Web content can produce a much faster site for users. Here's how to set it up and measure your success.
Configuring mod_gzip

The mod_gzip module is available for both Apache 1.3.x and Apache 2.0.x.[3], and it can be compiled into Apache as a dynamic shared object (DSO) or as a static module. The compilation for a DSO is simple; from the uncompressed source directory, perform the following steps as root:

make APXS=/path/to/apxs
make install APXS=/path/to/apxs
/path/to/apachectl graceful

mod_gzip must be loaded last in the module list, as Apache 1.3.x processes content in module order, and compression is the final step performed before data is sent. mod_gzip installs itself in the httpd.conf file, but it is commented out.

A basic configuration for mod_gzip in the httpd.conf should include:

mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime \
    ^application/postscript$
mod_gzip_item_exclude mime \
    ^application/x-javascript$
mod_gzip_item_exclude mime ^image/.*$
mod_gzip_item_exclude file \
    \.(?:exe|t?gz|zip|bz2|sit|rar)$

This allows PostScript files to be GZIP-encoded, while not compressing PDF files. PDF files should not be compressed; doing so leads to problems when attempting to display the files in Adobe Acrobat Reader. To be even more careful, you may want to exclude PDF files explicitly from being compressed:

mod_gzip_item_eclude mime ^application/pdf$
Configuring mod_deflate

The mod_deflate module for Apache 2.0.x is included with the source for this server, which makes compiling it into the server rather simple:

./configure --enable-modules=all \
    --enable-mods-shared=all --enable-deflate
make
make install

With mod_deflate for Apache 2.0.x, the GZIP encoding of documents can be enabled in one of two ways: explicit exclusion of files by extension or explicit inclusion of files by MIME type. These methods are specified in the httpd.conf file. Explicit exclusion looks like:

SetOutputFilter DEFLATE
DeflateFilterNote ratio
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ \
    no-gzip dont-vary
SetEnvIfNoCase Request_URI \
    \.(?:exe|t?gz|zip|bz2|sit|rar)$ \
    no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary

Explicit inclusion looks like:

DeflateFilterNote ratio
AddOutputFilterByType DEFLATE text/*
AddOutputFilterByType DEFLATE application/ms* \
    application/vnd* application/postscript

In the explicit exclusion method, the same exclusions are present as in the mod_gzip file, namely images and PDF files.

Compressing Dynamic Content

If your site uses dynamic content—XSSI, CGI and the like—nothing special needs to be done to compress the output of these modules. As mod_gzip and mod_deflate process all outgoing content before it is placed on the wire, all content from Apache that matches either the MIME types or the file extensions mapped in the configuration directives is compressed.

The output from PHP, the most popular dynamic scripting language for Apache, also can be compressed in one of three possible ways: using the built-in output handler, ob_gzhandler; using the built-in ZLIB compression; or using one of the Apache compression modules. Configuring PHP's built-in compression is simply a matter of compiling PHP with the --with-zlib configure option and then reconfiguring the php.ini file.

Below is what the output buffer method looks like:

output_buffering = On
output_handler = ob_gzhandler
zlib.output_compression = Off

The ZLIB method uses:

output_buffering = Off
output_handler =
zlib.output_compression = On

The output buffer method produces marginally better compression, but both methods work. The output buffer, ob_gzhandler, also can be added on a script-by-script basis, if you do not want to enable compression across the entire site.

If you do not want to reconfigure PHP with ZLIB enabled, the Apache compression modules can compress the content generated by PHP. I have configured my server so that Apache modules handle all of the compression, and all pages are compressed in a consistent manner, regardless of their origin.

Caching Compressed Content

Can compressed content be cached? The answer is an unequivocal yes. With mod_gzip and mod_deflate, Apache sends the Vary header, indicating to caches that this object differs from other requests for the same object based on certain criteria—user-agent, character set and so on. When a compressed object is received by a cache, it notes that the server returned a Vary: Accept-Encoding response. This response indicates it was generated based on the request containing the Accept-Encoding: gzip header.

Caching compressed content can lead to a situation where a cache stores two copies of the same document, one compressed and one uncompressed. This is a design feature of HTTP 1.1, and it allows clients with and without the ability to receive compressed content to benefit from the performance enhancements gained from local proxy caches.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I had no idea you could do

David Hops's picture

I had no idea you could do this. Should come in handy as I don't have much bandwidth at the moment and am hosting a few large images and zips. Only concern would be extra CPU load.

Great article, thank you for

Webtiful Search Engine SEO's picture

Great article, thank you for sharing.

mod_deflate

Mikhailov Anatoly's picture

Article about mod_deflate settings like on Amazon EC2 AMI
http://railsgeek.com/2008/12/16/apache2-httpd-improving-performance-mod_...

Nice Work!

Fargham's picture

A great and informative article!

Really helped me!!

Ashutosh Chaturvedi's picture

Hi,

It's really a great artical and really helped me. One question from myside is..

Is there anyway of copressing .tiff files by using mod_gzip??

Plz help me, if anyone has idea about the same.

Thanks in advance..

Ashutosh

Brain Cancers

seda's picture

I read your article.The things you have written sound very sincere and nice topics i am looking forward to its continuation.

Re: Compressing Web Content

Anonymous's picture

stephen,
good article summarizing the methods and benefits of compressing web pages. however, you should touch on the difficulty of using mod_gzip with mod_ssl under apache 1.3.x -- this is a cumbersome issue and there are only workaround solutions. one such is to use a mod_proxy frontend virtual server to buffer the ssl request, and a mod_gzip backend virtual server to handle the compression. more detail on this two stage approach is here:
http://lists.over.net/pipermail/mod_gzip/2002-February/005911.html
i have implemented the above method on a few production servers and it does indeed work, with some caveats.

i believe that i read somewhere that apache 2.x had improved handling of the gzip/ssl pairing. not having played with 2.x i'm not in a position to say whether or not it actually works. perhaps someone could comment on this.

regards,
jim

At what cost to the CPU

Anonymous's picture

Great article! I enjoyed reading it and found it very informative. One question though...

What will this module do to my CPU? Will the load average on my box go through the roof everytime I need to send out a compressed webpage? I think this would have been a nice point to look at as part of your article.

Really helped me!!

Ashutosh Chaturvedi's picture

Hi,

It's really a great artical and really helped me. One question from myside is..

Is there anyway of copressing .tiff files by using mod_gzip??

Plz help me, if anyone has idea about the same.

Thanks in advance..

Ashutosh

compressing .tiff files

Marcus's picture

To compress TIFF files, simply remove the following exlusion for images from the above example configuration:

mod_gzip_item_exclude mime ^image/.*$

Also, add the .tif file extension in the file inclusions:

mod_gzip_item_include file \.(tif)$

Please let me know if you add this because I'd like to test a browser implementation against it. Thanks.

Marcus Adams
yEnc Decoder Proxy

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState