Compressing Web Content with mod_gzip and mod_deflate

Compressing Web content can produce a much faster site for users. Here's how to set it up and measure your success.
Configuring mod_gzip

The mod_gzip module is available for both Apache 1.3.x and Apache 2.0.x.[3], and it can be compiled into Apache as a dynamic shared object (DSO) or as a static module. The compilation for a DSO is simple; from the uncompressed source directory, perform the following steps as root:

make APXS=/path/to/apxs
make install APXS=/path/to/apxs
/path/to/apachectl graceful

mod_gzip must be loaded last in the module list, as Apache 1.3.x processes content in module order, and compression is the final step performed before data is sent. mod_gzip installs itself in the httpd.conf file, but it is commented out.

A basic configuration for mod_gzip in the httpd.conf should include:

mod_gzip_item_include mime ^text/.*
mod_gzip_item_include mime \
    ^application/postscript$
mod_gzip_item_exclude mime \
    ^application/x-javascript$
mod_gzip_item_exclude mime ^image/.*$
mod_gzip_item_exclude file \
    \.(?:exe|t?gz|zip|bz2|sit|rar)$

This allows PostScript files to be GZIP-encoded, while not compressing PDF files. PDF files should not be compressed; doing so leads to problems when attempting to display the files in Adobe Acrobat Reader. To be even more careful, you may want to exclude PDF files explicitly from being compressed:

mod_gzip_item_eclude mime ^application/pdf$
Configuring mod_deflate

The mod_deflate module for Apache 2.0.x is included with the source for this server, which makes compiling it into the server rather simple:

./configure --enable-modules=all \
    --enable-mods-shared=all --enable-deflate
make
make install

With mod_deflate for Apache 2.0.x, the GZIP encoding of documents can be enabled in one of two ways: explicit exclusion of files by extension or explicit inclusion of files by MIME type. These methods are specified in the httpd.conf file. Explicit exclusion looks like:

SetOutputFilter DEFLATE
DeflateFilterNote ratio
SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ \
    no-gzip dont-vary
SetEnvIfNoCase Request_URI \
    \.(?:exe|t?gz|zip|bz2|sit|rar)$ \
    no-gzip dont-vary
SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary

Explicit inclusion looks like:

DeflateFilterNote ratio
AddOutputFilterByType DEFLATE text/*
AddOutputFilterByType DEFLATE application/ms* \
    application/vnd* application/postscript

In the explicit exclusion method, the same exclusions are present as in the mod_gzip file, namely images and PDF files.

Compressing Dynamic Content

If your site uses dynamic content—XSSI, CGI and the like—nothing special needs to be done to compress the output of these modules. As mod_gzip and mod_deflate process all outgoing content before it is placed on the wire, all content from Apache that matches either the MIME types or the file extensions mapped in the configuration directives is compressed.

The output from PHP, the most popular dynamic scripting language for Apache, also can be compressed in one of three possible ways: using the built-in output handler, ob_gzhandler; using the built-in ZLIB compression; or using one of the Apache compression modules. Configuring PHP's built-in compression is simply a matter of compiling PHP with the --with-zlib configure option and then reconfiguring the php.ini file.

Below is what the output buffer method looks like:

output_buffering = On
output_handler = ob_gzhandler
zlib.output_compression = Off

The ZLIB method uses:

output_buffering = Off
output_handler =
zlib.output_compression = On

The output buffer method produces marginally better compression, but both methods work. The output buffer, ob_gzhandler, also can be added on a script-by-script basis, if you do not want to enable compression across the entire site.

If you do not want to reconfigure PHP with ZLIB enabled, the Apache compression modules can compress the content generated by PHP. I have configured my server so that Apache modules handle all of the compression, and all pages are compressed in a consistent manner, regardless of their origin.

Caching Compressed Content

Can compressed content be cached? The answer is an unequivocal yes. With mod_gzip and mod_deflate, Apache sends the Vary header, indicating to caches that this object differs from other requests for the same object based on certain criteria—user-agent, character set and so on. When a compressed object is received by a cache, it notes that the server returned a Vary: Accept-Encoding response. This response indicates it was generated based on the request containing the Accept-Encoding: gzip header.

Caching compressed content can lead to a situation where a cache stores two copies of the same document, one compressed and one uncompressed. This is a design feature of HTTP 1.1, and it allows clients with and without the ability to receive compressed content to benefit from the performance enhancements gained from local proxy caches.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I had no idea you could do

David Hops's picture

I had no idea you could do this. Should come in handy as I don't have much bandwidth at the moment and am hosting a few large images and zips. Only concern would be extra CPU load.

Great article, thank you for

Webtiful Search Engine SEO's picture

Great article, thank you for sharing.

mod_deflate

Mikhailov Anatoly's picture

Article about mod_deflate settings like on Amazon EC2 AMI
http://railsgeek.com/2008/12/16/apache2-httpd-improving-performance-mod_...

Nice Work!

Fargham's picture

A great and informative article!

Really helped me!!

Ashutosh Chaturvedi's picture

Hi,

It's really a great artical and really helped me. One question from myside is..

Is there anyway of copressing .tiff files by using mod_gzip??

Plz help me, if anyone has idea about the same.

Thanks in advance..

Ashutosh

Brain Cancers

seda's picture

I read your article.The things you have written sound very sincere and nice topics i am looking forward to its continuation.

Re: Compressing Web Content

Anonymous's picture

stephen,
good article summarizing the methods and benefits of compressing web pages. however, you should touch on the difficulty of using mod_gzip with mod_ssl under apache 1.3.x -- this is a cumbersome issue and there are only workaround solutions. one such is to use a mod_proxy frontend virtual server to buffer the ssl request, and a mod_gzip backend virtual server to handle the compression. more detail on this two stage approach is here:
http://lists.over.net/pipermail/mod_gzip/2002-February/005911.html
i have implemented the above method on a few production servers and it does indeed work, with some caveats.

i believe that i read somewhere that apache 2.x had improved handling of the gzip/ssl pairing. not having played with 2.x i'm not in a position to say whether or not it actually works. perhaps someone could comment on this.

regards,
jim

At what cost to the CPU

Anonymous's picture

Great article! I enjoyed reading it and found it very informative. One question though...

What will this module do to my CPU? Will the load average on my box go through the roof everytime I need to send out a compressed webpage? I think this would have been a nice point to look at as part of your article.

Really helped me!!

Ashutosh Chaturvedi's picture

Hi,

It's really a great artical and really helped me. One question from myside is..

Is there anyway of copressing .tiff files by using mod_gzip??

Plz help me, if anyone has idea about the same.

Thanks in advance..

Ashutosh

compressing .tiff files

Marcus's picture

To compress TIFF files, simply remove the following exlusion for images from the above example configuration:

mod_gzip_item_exclude mime ^image/.*$

Also, add the .tif file extension in the file inclusions:

mod_gzip_item_include file \.(tif)$

Please let me know if you add this because I'd like to test a browser implementation against it. Thanks.

Marcus Adams
yEnc Decoder Proxy

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix