Paranoid Penguin - Building a Secure Squid Web Proxy, Part IV

Add squidGuard's blacklist functionality to your Squid proxy.
Getting and Installing Blacklists

Once you've obtained and installed squidGuard, you need a set of blacklists. There's a decent list of links to these at squidguard.org/blacklists.html, and of these, I think you could do far worse than Shalla's Blacklists (see Resources), a free-for-noncommercial-use set that includes more than 1.6 million entries organized into 65 categories. It's also free for commercial use; you just have to register and promise to provide feedback and list updates. Shalla's Blacklists are the set I use for the configuration examples through the rest of this article.

Once you've got a blacklist archive, unpack it. It doesn't necessarily matter where, so long as the entire directory hierarchy is owned by the same user and group under which Squid runs (proxy:proxy on Ubuntu systems). A common default location for blacklists is /var/lib/squidguard/db.

To extract Shalla's Blacklists to that directory, I move the archive file there:

bash-$ cp mv shallalist.tar.gz /var/lib/squidguard/db

Then, I unpack it like this:

bash-$ sudo -s
bash-# cd /var/lib/squidguard/db
bash-# tar --strip 1 -xvzf shallalist.tar.gz
bash-# rm shallalist.tar.tz

In the above tar command, the --strip 2 option strips the leading BL/ from the paths of everything extracted from the shallalist tar archive. Without this option, there would be an additional directory (BL/) under /var/lib/squidguard/db containing the blacklist categories, for example, /var/lib/squidguard/db/BL/realestate rather than /var/lib/squidguard/db/realestate. Note that you definitely want to delete the shallalist.tar.gz file as shown above; otherwise, squidGuard will include it in the database into which the contents of /var/lib/squidguard/db will later be imported.

Note also that at this point you're still in a root shell; you need to stay there for just a few more commands. To set appropriate ownership and permissions for your blacklists, use these commands:

bash-# chown -R proxy:proxy /var/lib/squidguard/db/
bash-# find /var/lib/squidguard/db -type f | xargs chmod 644
bash-# find /var/lib/squidguard/db -type d | xargs chmod 755
bash-# exit

And with that, your blacklists are ready for squidGuard to start blocking. After, that is, you configure squidGuard to do so.

Configuring squidGuard

On Ubuntu and OpenSUSE systems (and probably others), squidGuard's configuration file squidGuard.conf is kept in /etc/squid/, and squidGuard automatically looks there when it starts. As root, use the text editor of your choice to open /etc/squid/squidGuard.conf. If using a command-line editor like vi on Ubuntu systems, don't forget to use sudo, as with practically everything else under /etc/, you need to have superuser privileges to change squidGuard.conf.

squidGuard.conf's basic structure is:

  1. Options (mostly paths)

  2. Time Rules

  3. Rewrite Rules

  4. Source Addresses

  5. Destination Classes

  6. Access Control Lists

In this article, my goal is quite modest: to help you get started with a simple blacklist that applies to all users, regardless of time of day, and without any fancier URL-rewriting than redirecting all blocked transactions to the same page. Accordingly, let's focus on examples of Options, Destination Classes and ACLs. Before you change squidGuard.conf, it's a good idea to make a backup copy, like this:

bash-$ sudo cp /etc/squid/squidGuard.conf
/etc/squid/squidGuard.conf.def

Now you can edit squidGuard.conf. First, at the top, leave what are probably the default values of dbhome and logdir, which specify the paths of squidGuard's databases of blacklists (or whitelists—you also can write ACLs that explicitly allow access to certain sites and domains) and its log files, respectively. These are the defaults in Ubuntu:

dbhome /var/lib/squidguard/db
logdir /var/log/squid

These paths are easy enough to understand, especially considering that you just extracted Shalla's Blacklists to /var/lib/squidguard/db. Whatever you do, do not leave a completely blank line at the very top of the file; doing so prevents squidGuard from starting properly.

Next, you need to create a Destination Class. This being a security column, let's focus on blocking access to sites in the spyware and remotecontrol categories. You certainly don't want your users' systems to become infected with spyware, and you probably don't want users to grant outsiders remote control of their systems either.

Destination Classes that describe these two categories from Shalla's Blacklists look like this:

dest remotecontrol {
        domainlist      remotecontrol/domains
        urllist         remotecontrol/urls
}

dest spyware {
        domainlist      spyware/domains
        urllist         spyware/urls
}

As you can see, the paths in each domainlist and urllist statement are relative to the top-level database path you specified with the dbhome option. Note also the curly bracket {} placement: left brackets always immediately follow the destination name, on the same line, and right brackets always occupy their own line at the end of the class definition.

Finally, you need an ACL that references these destinations—specifically, one that blocks access to them. The ACL syntax in squidGuard is actually quite flexible, and it's easy to write both “allow all except...” and “block all except...” ACLs. Like most ACL languages, they're parsed left to right, top to bottom. Once a given transaction matches any element in an ACL, it's either blocked or passed as specified, and not matched against subsequent elements or ACLs.

You can have multiple ACL definitions in your squidGuard.conf file, but in this example scenario, it will suffice to edit the default ACL. A simple default ACL that passes all traffic unless destined for sites in the remotecontrol or spyware blacklists would look like this:

acl {
        default {
                pass !remotecontrol !spyware all
                redirect http://www.google.com
        }
}

In this example, default is the name of the ACL. Your default squidGuard.conf file probably already has an ACL definition named default, so be sure either to edit that one or delete it before entering the above definition; you can't have two different ACLs both named default.

The pass statement says that things matching remotecontrol (as defined in the prior Destination Class of that name) do not get passed, nor does spyware, but all (a wild card that matches anything that makes it that far in the pass statement) does. In other words, if a given destination matches anything in the remotecontrol or spyware blacklists (either by domain or URL), it won't be passed, but rather will be redirected per the subsequent redirect statement, which points to the Google home page.

Just to make sure you understand how this works, let me point out that if the wild card all occurred before !remotecontrol, as in “pass all !remotecontrol !spyware”, squidGuard would not block anything, because matched transactions aren't compared against any elements that follow the element they matched. When constructing ACLs, remember that order matters!

I freely admit I'm being very lazy in specifying that as my redirect page. More professional system administrators would want to put a customized “You've been redirected here because...” message onto a Web server under their control and list that URL instead. Alternatively, squidGuard comes with a handy CGI script that displays pertinent transaction data back to the user. On Ubuntu systems, the script's full path is /usr/share/doc/squidguard/examples/squidGuard.cgi.gz.

This brings me to my only complaint about squidGuard: if you want to display a custom message to redirected clients, you either need to run Apache on your Squid server and specify an http://localhost/ URL, or specify a URL pointing to some other Web server you control. This is in contrast to Squid itself, which has no problem displaying its own custom messages to clients without requiring a dedicated HTTP dæmon (either local or external).

To review, your complete sample squidGuard.conf file (not counting any commented-out lines from the default file) should look like this:

dbhome /var/lib/squidguard/db
logdir /var/log/squid

dest remotecontrol {
        domainlist      remotecontrol/domains
        urllist         remotecontrol/urls
}

dest spyware {
        domainlist      spyware/domains
        urllist         spyware/urls
}

acl {
        default {
                pass !remotecontrol !spyware all
                redirect http://www.google.com
        }
}

Now that squidGuard is configured and, among other things, knows where to look for its databases, you need to create actual database files for the files and directories under /var/lib/squidGuard/db, using this command:

bash-$ sudo -u proxy squidGuard -C all

This imports all files specified in active Destination Class definitions in squidGuard.conf, specifically in this example, the files remotecontrol/domains, remotecontrol/urls, spyware/domains and spyware/urls, into Berkeley DB-format databases. Obviously, squidGuard can access the blacklists much faster using database files than by parsing flat text files.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

getting squidguard to work!

Shane's picture

First, I am using ubuntu 9.0.4. My squid is 2.7stable3. My squidguard is 1.2.

Squid has been working fine for several days, I have a fairly complex set of acls and http_access rules because I am trying to dole out computer time to my kids during the holidays. I am also trying to stop access to certain sites during my "peak time" allocated by my ISP. After working through the obvious errors that a relative newb introduces without meaning to, it is stable, and predictable in behaviour and performance. Suffice to say that I have stripped the squid.conf of unneccesary clutter (comments and unused settings) and have added some structure to it that makes sense to me when going in to tweak it. I do have the original file in two places for referencing when I get into trouble, so can always reinstall and add my tweaks if needed.

Next step was to add squidguard for a deeper level of filtering...

So, I have assiduously followed the instructions here even to the point of copying the errors which reveal themselves on re-reading, e.g. "bash-$ /etc/init.d/squid reload" is missing sudo at the start of the line (it is dereferenced in the preceeding paragraph. After correcting the obvious errors

However, the moment I reload squid or restart squid it fails to load

I actually rebuilt a server because this happened the first time (over a week ago now) thinking that I had damaged some system files (of course I hadn't , but it was worth the practice of installing a new version of the server anyway)

So what can I be doing wrong? The only thing that makes sense is that I am adding the squidguard lines in the wrong place, but after having reviewed the original squid.conf my original placement was correct. So, are there any hidden traps for beginners that aren't mentioned in the article.

Shane
Feeling like,... "a Penguin in Bondage, boy!!!

follow-up

Shane's picture

Well - I found it, after checking the squidguard log file

wrong type of braces in the definitions of dest rules

I had used parantheses () instead of curly braces {}, which with my eyesight the way it is these days (even with my computer prescription glasses) are so similar to a glance rather than a close inspection, that it totally slipped on by

Caught by the worst of the gotchas for newbs who aren't new to programming (hangs head in shame)

Ah, well, at least if anyone else runs across this there is a solution already (I'd gone looking for the matching braces problem and found the bigger one)

Shane
bonds loosened but not released, yet!

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix