Paranoid Penguin - Building a Secure Squid Web Proxy, Part I

Nurture your inner control freak with Squid.

Consider the venerable Web proxy—back when the Internet was new to most of us, setting up a Web proxy was a convenient way to grant users of an otherwise non-Internet-connected network access to the World Wide Web. The proxy also provided a convenient point to log outbound Web requests, to maintain whitelists of allowed sites or blacklists of forbidden sites and to enforce an extra layer of authentication in cases where some, but not all, of your users had Internet privileges.

Nowadays, of course, Internet access is ubiquitous. The eclipsing of proprietary LAN protocols by TCP/IP, combined with the technique of Network Address Translation (NAT), has made it easy to grant direct access from “internal” corporate and organizational networks to Internet sites. So the whole idea of a Web proxy is sort of obsolete, right?

Actually, no.

After last month's editorial, we return to technical matters—specifically, to the venerable but assuredly not obsolete Web proxy. This month, I describe, in depth, the security benefits of proxying your outbound Web traffic, and some architectural and design considerations involved with doing so. In subsequent columns, I'll show you how to build a secure Web proxy using Squid, the most popular open-source Web proxy package, plus a couple of adjunct programs that add key security functionality to Squid.

What Exactly Is a Web Proxy?

The last time I discussed proxies in this space was in my December 2002 article “Configuring and Using an FTP Proxy”. (Where does the time go?) A quick definition, therefore, is in order.

The concept of a Web proxy is simple. Rather than allowing client systems to interact directly with Web servers, a Web proxy impersonates the server to the client, while simultaneously opening a second connection to the Web server on the client's behalf and impersonating the client to that server. This is illustrated in Figure 1.

Figure 1. How Web Proxies Work

Because Web proxies have been so common for so long, all major Web browsers can be configured to communicate directly through Web proxies in a “proxy-aware” fashion. Alternatively, many Web proxies support “transparent” operation, in which Web clients are unaware of the proxy's presence, but their traffic is diverted to the proxy via firewall rules or router policies.

Why Proxy?

Just because nowadays it's easy to interconnect TCP/IP networks directly doesn't mean you always should. If a nasty worm infects systems on your internal network, do you want to deal with the ramifications of the infection spreading outward, for example, to some critical business partner with whom your users communicate over the Internet?

In many organizations, network engineers take it for granted that all connected systems will use a “default route” that provides a path out to the Internet. In other organizations, however, it's considered much less risky to direct all Web traffic out through a controlled Web proxy to which routes are internally published and to use no default route whatsoever at the LAN level.

This has the effect of allowing users to reach the Internet via the Web proxy—that is, to surf the Web—but not to use the Internet for non-Web applications, such as IRC, on-line gaming and so forth. It follows that what end users can't do, neither can whatever malware that manages to infect their systems.

Obviously, this technique works only if you've got other types of gateways for the non-Web traffic you need to route outward, or if the only outbound Internet traffic you need to deal with is Web traffic. My point is, a Web proxy can be a very useful tool in controlling outbound Internet traffic.

What if your organization is in a regulated industry, in which it's sometimes necessary to track some users' Web access? You can do that on your firewall, of course, but generally speaking, it's a bad idea to make a firewall log more than you have to for forensics purposes. This is because logging is I/O-intensive, and too much of it can impact negatively the firewall's ability to fulfill its primary function, assessing and dealing with network transactions. (Accordingly, it's common practice mainly to log “deny/reject” actions on firewalls and not to log “allowed” traffic except when troubleshooting.)

A Web proxy, therefore, provides a better place to capture and record logs of Web activity than on firewalls or network devices.

Another important security function of Web proxies is blacklisting. This is an unpleasant topic—if I didn't believe in personal choice and freedom, I wouldn't have been writing about open-source software since 2000—but the fact is that many organizations have legitimate, often critical, reasons for restricting their users' Web access.

A blacklist is a list of forbidden URLs and name domains. A good blacklist allows you to choose from different categories of URLs to block, such as social networking, sports, pornography, known spyware-propagators and so on. Note that not all blacklist categories necessarily involve restricting personal freedom per se; some blacklists provide categories of “known evil” sites that, regardless of whatever content they're actually advertising, are known to try to infect users with spyware or adware, or otherwise attack unsuspecting visitors.

And, I think a lot of Web site visitors do tend to be unsuspecting. The classic malware vector is the e-mail attachment—an image or executable binary that you trick the recipient into double-clicking on. But, what if you could execute code on users' systems without having to trick them into doing anything but visit a Web page?

In the post-Web 2.0 world, Web pages nearly always contain some sort of executable code (Java, JavaScript, ActiveX, .NET, PHP and so on), and even if your victim is running the best antivirus software with the latest signatures, it won't examine any of that code, let alone identify evil behavior in it. So, sure enough, the “hostile Web site” has become the cutting edge in malware propagation and identity theft.

Phishing Web sites typically depend on DNS redirection (usually through cache poisoning), which involves redirecting a legitimate URL to an attacker's IP address rather than that site's real IP, so they're difficult to protect against in URL or domain blacklists. (At any rate, none of the free blacklists I've looked at include a phishing category.) Spyware, however, is a common blacklist category, and a good blacklist contains thousands of sites known to propagate client-side code you almost certainly don't want executed on your users' systems.

Obviously, no URL blacklist ever can cover more than a tiny fraction of the actual number of hostile Web sites active at any given moment. The real solution to the problem of hostile Web sites is some combination of client/endpoint security controls, better Web browser and operating system design, and in advancing the antivirus software industry beyond its reliance on virus signatures (hashes of known evil files), which it's been stuck on for decades.

Nevertheless, at this very early stage in our awareness of and ability to mitigate this type of risk, blacklists add some measure of protection where presently there's very little else. So, regardless of whether you need to restrict user activity per se (blocking access to porn and so forth), a blacklist with a well-maintained spyware category may be all the justification you need to add blacklisting capabilities to your Web proxy. SquidGuard can be used to add blacklists to the Squid Web proxy.