Paranoid Penguin - Building a Secure Squid Web Proxy, Part IV
Once you've obtained and installed squidGuard, you need a set of blacklists. There's a decent list of links to these at squidguard.org/blacklists.html, and of these, I think you could do far worse than Shalla's Blacklists (see Resources), a free-for-noncommercial-use set that includes more than 1.6 million entries organized into 65 categories. It's also free for commercial use; you just have to register and promise to provide feedback and list updates. Shalla's Blacklists are the set I use for the configuration examples through the rest of this article.
Once you've got a blacklist archive, unpack it. It doesn't necessarily matter where, so long as the entire directory hierarchy is owned by the same user and group under which Squid runs (proxy:proxy on Ubuntu systems). A common default location for blacklists is /var/lib/squidguard/db.
To extract Shalla's Blacklists to that directory, I move the archive file there:
bash-$ cp mv shallalist.tar.gz /var/lib/squidguard/db
Then, I unpack it like this:
bash-$ sudo -s bash-# cd /var/lib/squidguard/db bash-# tar --strip 1 -xvzf shallalist.tar.gz bash-# rm shallalist.tar.tz
In the above tar command, the --strip 2 option strips the leading BL/ from the paths of everything extracted from the shallalist tar archive. Without this option, there would be an additional directory (BL/) under /var/lib/squidguard/db containing the blacklist categories, for example, /var/lib/squidguard/db/BL/realestate rather than /var/lib/squidguard/db/realestate. Note that you definitely want to delete the shallalist.tar.gz file as shown above; otherwise, squidGuard will include it in the database into which the contents of /var/lib/squidguard/db will later be imported.
Note also that at this point you're still in a root shell; you need to stay there for just a few more commands. To set appropriate ownership and permissions for your blacklists, use these commands:
bash-# chown -R proxy:proxy /var/lib/squidguard/db/ bash-# find /var/lib/squidguard/db -type f | xargs chmod 644 bash-# find /var/lib/squidguard/db -type d | xargs chmod 755 bash-# exit
And with that, your blacklists are ready for squidGuard to start blocking. After, that is, you configure squidGuard to do so.
On Ubuntu and OpenSUSE systems (and probably others), squidGuard's configuration file squidGuard.conf is kept in /etc/squid/, and squidGuard automatically looks there when it starts. As root, use the text editor of your choice to open /etc/squid/squidGuard.conf. If using a command-line editor like vi on Ubuntu systems, don't forget to use sudo, as with practically everything else under /etc/, you need to have superuser privileges to change squidGuard.conf.
squidGuard.conf's basic structure is:
Options (mostly paths)
Time Rules
Rewrite Rules
Source Addresses
Destination Classes
Access Control Lists
In this article, my goal is quite modest: to help you get started with a simple blacklist that applies to all users, regardless of time of day, and without any fancier URL-rewriting than redirecting all blocked transactions to the same page. Accordingly, let's focus on examples of Options, Destination Classes and ACLs. Before you change squidGuard.conf, it's a good idea to make a backup copy, like this:
bash-$ sudo cp /etc/squid/squidGuard.conf /etc/squid/squidGuard.conf.def
Now you can edit squidGuard.conf. First, at the top, leave what are probably the default values of dbhome and logdir, which specify the paths of squidGuard's databases of blacklists (or whitelists—you also can write ACLs that explicitly allow access to certain sites and domains) and its log files, respectively. These are the defaults in Ubuntu:
dbhome /var/lib/squidguard/db logdir /var/log/squid
These paths are easy enough to understand, especially considering that you just extracted Shalla's Blacklists to /var/lib/squidguard/db. Whatever you do, do not leave a completely blank line at the very top of the file; doing so prevents squidGuard from starting properly.
Next, you need to create a Destination Class. This being a security column, let's focus on blocking access to sites in the spyware and remotecontrol categories. You certainly don't want your users' systems to become infected with spyware, and you probably don't want users to grant outsiders remote control of their systems either.
Destination Classes that describe these two categories from Shalla's Blacklists look like this:
dest remotecontrol {
domainlist remotecontrol/domains
urllist remotecontrol/urls
}
dest spyware {
domainlist spyware/domains
urllist spyware/urls
}
As you can see, the paths in each domainlist and urllist statement are relative to the top-level database path you specified with the dbhome option. Note also the curly bracket {} placement: left brackets always immediately follow the destination name, on the same line, and right brackets always occupy their own line at the end of the class definition.
Finally, you need an ACL that references these destinations—specifically, one that blocks access to them. The ACL syntax in squidGuard is actually quite flexible, and it's easy to write both “allow all except...” and “block all except...” ACLs. Like most ACL languages, they're parsed left to right, top to bottom. Once a given transaction matches any element in an ACL, it's either blocked or passed as specified, and not matched against subsequent elements or ACLs.
You can have multiple ACL definitions in your squidGuard.conf file, but in this example scenario, it will suffice to edit the default ACL. A simple default ACL that passes all traffic unless destined for sites in the remotecontrol or spyware blacklists would look like this:
acl {
default {
pass !remotecontrol !spyware all
redirect http://www.google.com
}
}
In this example, default is the name of the ACL. Your default squidGuard.conf file probably already has an ACL definition named default, so be sure either to edit that one or delete it before entering the above definition; you can't have two different ACLs both named default.
The pass statement says that things matching remotecontrol (as defined in the prior Destination Class of that name) do not get passed, nor does spyware, but all (a wild card that matches anything that makes it that far in the pass statement) does. In other words, if a given destination matches anything in the remotecontrol or spyware blacklists (either by domain or URL), it won't be passed, but rather will be redirected per the subsequent redirect statement, which points to the Google home page.
Just to make sure you understand how this works, let me point out that if the wild card all occurred before !remotecontrol, as in “pass all !remotecontrol !spyware”, squidGuard would not block anything, because matched transactions aren't compared against any elements that follow the element they matched. When constructing ACLs, remember that order matters!
I freely admit I'm being very lazy in specifying that as my redirect page. More professional system administrators would want to put a customized “You've been redirected here because...” message onto a Web server under their control and list that URL instead. Alternatively, squidGuard comes with a handy CGI script that displays pertinent transaction data back to the user. On Ubuntu systems, the script's full path is /usr/share/doc/squidguard/examples/squidGuard.cgi.gz.
This brings me to my only complaint about squidGuard: if you want to display a custom message to redirected clients, you either need to run Apache on your Squid server and specify an http://localhost/ URL, or specify a URL pointing to some other Web server you control. This is in contrast to Squid itself, which has no problem displaying its own custom messages to clients without requiring a dedicated HTTP dæmon (either local or external).
To review, your complete sample squidGuard.conf file (not counting any commented-out lines from the default file) should look like this:
dbhome /var/lib/squidguard/db
logdir /var/log/squid
dest remotecontrol {
domainlist remotecontrol/domains
urllist remotecontrol/urls
}
dest spyware {
domainlist spyware/domains
urllist spyware/urls
}
acl {
default {
pass !remotecontrol !spyware all
redirect http://www.google.com
}
}
Now that squidGuard is configured and, among other things, knows where to look for its databases, you need to create actual database files for the files and directories under /var/lib/squidGuard/db, using this command:
bash-$ sudo -u proxy squidGuard -C all
This imports all files specified in active Destination Class definitions in squidGuard.conf, specifically in this example, the files remotecontrol/domains, remotecontrol/urls, spyware/domains and spyware/urls, into Berkeley DB-format databases. Obviously, squidGuard can access the blacklists much faster using database files than by parsing flat text files.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Dart: a New Web Programming Experience
- Developer Poll
- What's the tweeting protocol?
- May 2013 Issue of Linux Journal: Raspberry Pi
- Reply to comment | Linux Journal
3 hours 52 min ago - Reply to comment | Linux Journal
4 hours 39 min ago - Web Hosting IQ
6 hours 12 min ago - Thanks for taking the time to
7 hours 49 min ago - Linux is good
9 hours 47 min ago - Reply to comment | Linux Journal
10 hours 4 min ago - Web Hosting IQ
10 hours 34 min ago - Web Hosting IQ
10 hours 35 min ago - Web Hosting IQ
10 hours 35 min ago - Reply to comment | Linux Journal
13 hours 36 min ago
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.




Comments
getting squidguard to work!
First, I am using ubuntu 9.0.4. My squid is 2.7stable3. My squidguard is 1.2.
Squid has been working fine for several days, I have a fairly complex set of acls and http_access rules because I am trying to dole out computer time to my kids during the holidays. I am also trying to stop access to certain sites during my "peak time" allocated by my ISP. After working through the obvious errors that a relative newb introduces without meaning to, it is stable, and predictable in behaviour and performance. Suffice to say that I have stripped the squid.conf of unneccesary clutter (comments and unused settings) and have added some structure to it that makes sense to me when going in to tweak it. I do have the original file in two places for referencing when I get into trouble, so can always reinstall and add my tweaks if needed.
Next step was to add squidguard for a deeper level of filtering...
So, I have assiduously followed the instructions here even to the point of copying the errors which reveal themselves on re-reading, e.g. "bash-$ /etc/init.d/squid reload" is missing sudo at the start of the line (it is dereferenced in the preceeding paragraph. After correcting the obvious errors
However, the moment I reload squid or restart squid it fails to load
I actually rebuilt a server because this happened the first time (over a week ago now) thinking that I had damaged some system files (of course I hadn't , but it was worth the practice of installing a new version of the server anyway)
So what can I be doing wrong? The only thing that makes sense is that I am adding the squidguard lines in the wrong place, but after having reviewed the original squid.conf my original placement was correct. So, are there any hidden traps for beginners that aren't mentioned in the article.
Shane
Feeling like,... "a Penguin in Bondage, boy!!!
follow-up
Well - I found it, after checking the squidguard log file
wrong type of braces in the definitions of dest rules
I had used parantheses () instead of curly braces {}, which with my eyesight the way it is these days (even with my computer prescription glasses) are so similar to a glance rather than a close inspection, that it totally slipped on by
Caught by the worst of the gotchas for newbs who aren't new to programming (hangs head in shame)
Ah, well, at least if anyone else runs across this there is a solution already (I'd gone looking for the matching braces problem and found the bigger one)
Shane
bonds loosened but not released, yet!