Caching the Web, Part 2

This month Mr. Guerrero tells us about the definitive proxy-cache server, Squid.
Restricting Access to Your Cache

In order to enable only those users who are in your organization to access your cache, you must set up some access control lists (ACLs). Defining access lists in Squid is quite easy; all access lists are defined with a name and are used to define a subset of elements. You can make a subset of IP addresses, protocols, destination URLs and even browser brands. The directive to define an ACL or subset is:

acl

You can learn more about ACL types in the example squid.conf. In the case of restricting access to only our users, the type needed is src. For example, suppose you want to allow access to the cache to all browsers in the 172.16.236.0 class C, the first 32 addresses of the next class C and your PC, 172.16.237.180. You can define an ACL like this:

acl my_users src 172.16.236.0/255.255.255.0
acl my_users src\
172.16.237.1-172.16.237.32/255.255.255.255
acl my_users src\
172.16.237.180/255.255.255.255

Next, define an ACL for the rest of the addresses. This line is included in the squid.conf example file:

acl all src 0.0.0.0/0.0.0.0

Apply these ACLs in an ordered way with the http_access directive. The syntax is:

http_access

For example:

http_access allow my_users
http_access deny all

More than one ACL can be combined in the same http_access directive and can be used in its negative form (i.e., preceded by !). The example shown is the most simple use of ACLs, but more complex forms will allow connections only in designated hours and days, allow only defined URLs or domains to be fetched and restrict some protocols such as FTP. This powerful feature of Squid can help you enforce and implement your security policy, whether you use Squid in your firewall or the Squid machine is the only one allowed to cross your firewall. Just look for examples in squid.conf.

There is also an ACL to permit setting the desired web ports you allow your users to use. This is the Safe_ports ACL. You should uncomment this line and add the 443 port to this ACL in order to allow the use of secure web servers through your Squid server.

A Look at the Logs

Squid can generate huge logs of your proxy-cache usage. With this information and the help of some scripts, we can generate complete access statistics, like the ones generated from web servers. Squid maintains three main log files:

  • cache_log includes warnings and information about the status and operational issues of the cache.

  • store_log includes information about database operations, such as inserts of new items and releases of expired objects.

  • access_log contains an entry for each object fetched from the cache and information on how it was served. It also includes information about each ICP query received by the cache from other servers using this server as a neighbor.

Many utilities are available for generating statistics from the access_log file (see Resources). Remember, it is not considered ethical to surf your access_log to see which places your users visit. Some sites have chosen not to publish processed statistics in any form to guard their users' privacy, which is an important concern for all of us involved in the Internet community.

The logs grow very quickly and in a few days can eat up your remaining disk space. To safely clean your log files, you should rotate them with the SIGUSR1 signal. A single line can be added to your crontab to begin new log files each night:

/usr/local/squid/bin/squid -k rotate

This command will create the files access_log.0, store_log.0 and cache_log.0 and begin logging to new empty files. Now you can safely remove these files or process them for statistical purposes. The next time you rotate logs, files.0 will be moved to files.1 and so on. You can configure how many extensions Squid will use for these rotations to save disk space with the logfile_rotate n directive in the squid.conf file.

Configuring Browsers to Use Cache

To begin using your new proxy-cache server, you must first instruct your user's browsers to fetch objects from your server instead of retrieving them directly. In most modern web browsers, one of the configuration options is the specification of the proxy setup. Another option is to specify a list of domains or URL patterns which must be fetched through the proxy.

In Netscape Navigator or Communicator, you can include a proxy server and its port for each service to be proxied. With Squid, you can use these settings for the HTTP, Security (SSL), FTP and WAIS services, all with the same port (3128, by default). First, select the “Manual proxy configuration” radio button and then the “View” button to type in your settings. Figures 1 and 2 show examples of these screens.

Figure 1. Proxy Preferences Screen

Figure 2. Manual Proxy Configuration Screen

Another solution is the Automatic Proxy Configuration, introduced in Netscape Navigator 3.0, that allows multiple proxy servers, backup servers and different servers by domains. This configuration sits in a Javascript-like file that must be retrieved from a server. Using it, you can change the topology of your cache mesh or introduce new servers that must be treated as “No proxy for” servers. Without telling your users to change their configurations, the new configuration script is reloaded each time the browser is launched. MS Internet Explorer has also supported the automatic proxy configuration feature since version 3.02.

Figure 3. Automatic Proxy Configuration Screen

An example of this kind of configuration for Netscape Navigator and Communicator is shown in Figure 3. In this example, each time the browser is started, it loads the file proxy.pac from the server intranet.mec.es. This file must be returned with MIME-Type application/x-ns-proxy-autoconfig which can be accomplished in two ways:

  1. Or add the following line to your mime.types file:

    application/x-ns-proxy-autoconfig pac
  1. Add the following line to your Apache srm.conf file:

    AddType application/x-ns-proxy-autoconfig pac

For the changes to take effect, you must name your proxy auto-configuration file with the .pac extension and restart your web server. The Netscape documentation will tell you about the syntax of the .pac file (see Resources). Nevertheless, we'll look at a couple basic examples of how to write them.

No HTML tags should be embedded in the Javascript file, just the function FindProxyForURL with arguments URL and host. This function should return a single string containing DIRECT (get the object directly from the source), or PROXY host:port (get the object through this server and port). The string can contain more than one of these directives, separated by semicolons. For example:

function FindProxyForURL(
{
return "PROXY proxy1.mec.es:3128;
PROXY proxy2.mec.es:80; DIRECT ";
}

will instruct the browser to use the first proxy to fetch the object. If it can't contact the first (proxy1), then it will try the second (proxy2); in the case that both are down, it will fetch the object from the source. This gives a fault tolerance level to our cache system.

One interesting feature is using different proxies for different domains and including support for internal servers where we don't want to use the cache. For example:

function FindProxyForURL(
{
 if ( isPlainHostName(host) || dnsDomainIs(host,
        "intranet.mec.es"))
 return "DIRECT";
 else if (shExpMatch(host, "*.com"))
 return "PROXY proxy1.mec.es:3128";
 else
 return "PROXY proxy2.mec.es:80";
}

This function will directly fetch all objects whose URL is only a word with no dots or the Intranet server, all .COM objects from proxy1 and the rest from proxy2.

As a tip, the .pac file can be generated “on the fly” by a CGI script, giving different proxy configurations for different browsers, e.g., depending on the REMOTE_HOST environment variable provided by the CGI interface. In this way, load balancing between different networks can be achieved. Always remember that the MIME-type returned by the CGI must be application/x-ns-proxy-autoconfig.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Anti allergic

Rune's picture

Greeting. There ain't no free lunches in this country. And don't go spending your whole life commiserating that you got raw deals. You've got to say, 'I think that if I keep working at this and want it bad enough I can have it.'
I am from Sweden and now teach English, give true I wrote the following sentence: "Health wellness retreats seminars.Their susceptibility to inhibition by given anti allergic com pounds."

Waiting for a reply :-), Endora.

Squid file Descriptor problem

Shiv's picture

Hi David. I m having some problem with squid my squid log displays following warning.
WARNING! Your cache is running out of file descriptors.

What will be the solution for this any suggestion.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState