Necessary Censorship: Web Filtering with Open Source

In some cases and for some audiences, relying on the human safeguard isn't facing reality. Here are some tools you can use in those cases.

You're the administrator of a cash-strapped school system and received a note saying you'll finally be able to get the school connected to the Net--as soon as you have a plan to comply with CIPA(1). Or you're out in Corporate America, and when the boss typo-ed a URL, she saw some very interesting pictures on her screen. Or you're simply Joe Penguinhead at home, having had the talk with the spousal unit, and you've decided it's time for Junior to have a computer of his very own. In short, you're now stuck with committing censorship.

In the course of doing research for this article, I ran across pieces from EFF and Peacefire (plus one e-mailed lecture) saying that all censorship is bad that we should simply educate our children and coworkers on responsible cyber-surfing, blah, blah, blah.... It's true, and in an ideal world, we could do that. Unfortunately, the world is not populated by only responsible adults and well-educated children. And everyone makes a typo once in a while. Thus, we are forced to do something about it.

The Children's Internet Protection Act mandates that a school or library must have an Internet safety policy, must hold a public review of that policy and must use a "technology protection measure" on all computers connected to the Internet. Whether and how that software can be disabled on certain computers is a local decision. It does not mandate that software be perfect; indeed, many web pages, both administrative and commercial, emphasized that filtering would not be perfect. More on that in a minute.

Corporate firewalls often are quite a bit more aggressive. One colleague I spoke to said his employer didn't block only the naughty bits, but sports, third-party e-mail providers, web comics, in short, almost everything anything that wasn't work-related.

Then, there are those of us stuck on slow links who simply would like to not have third parties like Doubleclick fouling our bandwidth. This group probably contains a lot more of us than people might think.

The problem with most censorware--aside from the cost and the fact that it more than likely is written for platforms that people reading this would rather not be running--is one of control. Because the software is proprietary, not only do you not have control over what it is you're blocking, you don't even know what's on the blacklist. As I write this article, there is an ongoing lawsuit in Pennsylvania regarding free-speech advocates' ability to access the state's blacklist.

Some situations may call for more blocking, while other require less, but normally, no provision is in place about how to get the list changed. Two vendors do allow you to submit a URL for review: N2H2 and Dan's Guardian. While N2H2 does not publicize their entire list, they do have a URL checker. Dan's is even more open, but I'm getting ahead of myself.

So, I said to myself, "Self, if you can't beat them, perhaps it's time to join them." Maybe we need open-source censorware, strange as that may sound, with a publicly available list. It would offer the ability to tinker with both the code and the list to suit the needs of folks who have to do this type of work.

I was stunned by the answer I found: two such animals already are available. One is Dan's Guardian, which I mentioned above; the other is squidGuard, a plug-in for the Squid web proxy. Squid and squidGuard are offered under the GPL, and they are free as in beer as well. I'm getting some funny looks, I know; you'll see why in a minute. Both are apt-gettable for Debian fans. Mandrake folks can get them from the club site; or, do as Red Hat folks have to do, and compile it from source.

One of the items in squidGuard's contrib directory is squidGuardRobot, a spider that goes out and analyzes newly accessed web sites for content and then refreshes the blacklist. Because squidGuard is under the GPL, and you can put whatever blacklist in you want, you have complete control over how filtering works at the usual cost of maintenance. The Open Source Directory offers a plethora of free blacklists that work with squidGuard, arranged according to any category in their directory structure and by content rating (roughly equivalent to G, PG, PG-13 and so on).

Now, we come to Dan's Guardian. Dan's Guardian comes with an interesting licensing setup. It's GPL, which means the Debian folks have zero issues with putting it in their distributions, but it uses the clause in the GPL that allows a vendor to charge for GPL software. The web site says the scheme has been vetted personally by RMS as legitimate use of the GPL; it also passes this author's understanding thereof, for what it's worth. The blacklist is subscription-based, but free for trial use. A form on the blacklist download page allows users to add URLs to both the whitelist and blacklist, and, further down, feedback on recently submitted URLs.

Whereas squidGuard and other censorware, except Symantec's i-Gear, work on a simple URL list or URL regular expressions, Dan's Guardian actually looks at the content of the web page on the fly, scanning for words and phrases that meet the criteria for blacklisting or whitelisting. You also can use your squidGuard blacklists with Dan's Guardian, which means all that DMOZ stuff works here as well. Dan's Guardian works as a proxy plug-in with Squid the same way squidGuard does. It also works as a plug-in for Oops, another, lighter-weight web proxy.

Dan's Guardian is a cheap way to be CIPA compliant without having to worry about it a whole lot. The software is free for non-commercial use and an educational rate subscription for a once-monthly download of the actively maintained blacklist is $5/month or $60/year. I understand free software, but I'm far more interested in having it be free as in speech--the blacklist comes down in readable format--than free as in beer. I'm also not opposed to paying reasonable prices for good software. If you want to do both kinds of free, squidGuard is there. Be my guest; I'll likely join you. But there's something to be said for a cheap way that a busy librarian responsible can take care of the computers and not have to worry about what Johnny's going to see or what his mom is going to say about it.

Some of you probably will point me at Privoxy, the Sourceforge project that grew out of Junkbuster. While it's a great way to get rid of the ads and the cookies and the pop-ups, you'd have to convert the squidGuard-type blacklists over to Privoxy's format every time a new list came out, a less-than-efficient use of time and resources. The bandwidth suck on new lists alone is considerable--6MB for a typical one. Although corporate OC-class folk might think this is trivial, believe me, on 56K, it's decidedly painful.

So, there you go. It is possible to commit censorship in a totally GPL fashion so not only do you know what it is you're censoring, but so you can control it to your heart's content. Although open-source advocates generally consider arbitrary, proprietary censorship to be a bad thing--its alternative being one of the reasons behind the Open Source Movement--controlling what comes into your computer and network wisely and with an open mind is a good thing. After all, the big reason this author runs Linux is so that he, himself, controls what does and does not happen on his computers.

Glenn Stone is a Red Hat Certified Engineer, sysadmin, technical writer, cover model and general Linux flunkie. He has been hand-building computers for fun and profit since 1999, and he is a happy denizen of the Pacific Northwest.

email: liawol.org!gs

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

ALL

porno's picture

nice post.Thnx my friend.i like this in today

ThaNks

ARabesk Rap's picture

Thank you very much for this information.
Good post thanks for sharin.
I like this site ..

All the computers are

Anonymous's picture

All the computers are conected to internet via a linux gateway.

The internet services has been for two years, but the first two months , we have problem with the internet content.

The school has a internal verbal policy, and the students respect it, but what happen when a student open his hotmail account or yahoo, and there is a a subject email who said "your whatever online order", and there is porn advertise inside?.

So because I installed the linux box , I was looking for a filter content. There are many comercial products outside, like watchguard as harware, and igear as software. but them are expensive, and for now we prefer to invest money in good hardware and choose open source software.And forgot to mencion that they want to charge you monthly or anual fees for their service.

While I was looking, I found squidguard. This software works with squid web cache proxy. But, first Iam not a geek, and I even didnt know anything about a web cache proxy and I found dificult to configure squidguard with squid web cache, because I am some desperate person.

So I looked for another alternative, and there was Dansgurdian. This is the winner.

Just you have to download the squid web cache , install it, then download the dans version, install it and follow the how to in dansguardian website.

This setup works really great. I had some basic problem with configuration becasue my background. But wow this really worth the time I dedicated .

So in that case it was time to change our PII generic linux gateway,file, etc ,server for a really server. Because I told to my father , " the school has saved $$$ dollars in Alem Sohbet liscense server (thinking in windows servers), and now in the filter content", why not take advantage of the squid cache feature and thus take more efficient the adsl conection.

Then the school invest in a xeon dell server with 2 scsi hard drives.

I can just said that squid and dansguardian is a good option and we and students parents are very happy.

I have in my house testing a squid , dans setup working with pics, wigthed phrase enable and without blacklist sohbet siteleri enable and filter almost all. just you have to play a little with the conf files, and you will have all the control.

And of course if you want an easy install and configure metod , there is smoothwall (I havent Sohbet odaları used it) who use dansguardian engine, for a reasonable charge.

In the porno izle future we are thinking in having a cheap PIV just firewall box with smoothwall.

And for last, sorry for my english.

Alem Can

AlemCan@hotmail.com

No Mention of Untangle??

Anonymous's picture

It should also be mentioned that Untangle offers a very comprehensive content filter that is free and also offers extra apps for purchase. My school has been using it for the past 8 months and we have been very happy with its results. The basic web filter is free and meets CIPA requirements. It also filters for Spyware, Virus's, and Phishers as well as blocks MIME types or protocols such as online gaming or download apps such as Limewire or Bit Torrent. Users can also be granted access rights for specific times to specific sites such as internet turns off altogether at x-time or facebook and myspace are only available at specific times.
I was surprised the author did not mention Untangle. I hope this helps someone it might be worth looking into.

Looking for specific open source software

Sarah's picture

Hello! Thanks so much for the suggestions! I'm fairly computer saavy, but I'm also not familiar with all the different types of languages, so sometimes I find it difficult putting in terms and words to find exactly what I'm looking for; so I'm hoping you would have some suggestions.

Let me explain the situation:

My mother and father-in-law have a granddaughter that they're raising, and she has frequently abused computer priveledges. So they have grounded her off things like facebook, and Windows Live, etc. She will then be all sweet and loving, and my mother-in-law will let her on the computer to "do her homework" or "play neopets" and she figures she will just check in on her granddaughter to make sure she's doing what she's doing. I will find out later (by searching the internet, or by checking their computer next time I'm over there), that my neice will have gone on to facebook anyway, or chatted it up with some friends, made a website, etc anyway. And then she will lie about it, digging her grave even further by getting grounded off it even longer. She will watch her grandmother put in the password to get her on the computer in the first place, and since my grandmother is a slow typer, and my neice is sneaky, she will find the password, and then get up late at night (like 3am) to go on the computer. It's quite the problem, and I need a program where if my mother-in-law lets my neice on the computer, she doesn't have to worry about her chatting or making webpages, etc. My neice is falling behind in school because she gets so distracted by things on the internet.

So is there a program that can block out her from using Windows Live by requiring a password, and at the same time allow my mother-in-law into windows live to talk to her relatives? They don't want to uninstall windows live because they like to use it too. And also, is there a program that I can manually block the sites out (like facebook). Currently I use the content advisor that Internet Explorer has, but because my in-laws are so forgetful, I make it the same password as the login for windows. Plus, my in-laws always forget to log out so then my neice will sneak on there when they go to bed. I will set it so that as soon as the screen saver comes on, a password is required to get back in, but she will simply take that off as soon as she gets the chance.

I hope this kind of explains what I'm looking for! My in-laws just know NOTHING about computers and it makes me so mad that my neice uses that to her advantage!

Excellent Virus protection

Anonymous's picture

There is a patch that, when applied to the DansGuardian filer, performs virus protection. So, for those who don't want to censor words, but just want to be capable of browsing the internet without the threat of some site or webmail message downloading a virus or spyware, DansGuardian is still a "must get".

To date, the only alternatives to DansGuardian with the antivirus patch are commercial and can cost thousands of dollars for a "license".

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

CensorNet (http://www.intrago.co.uk/products/censornet.php) is an interesting extrapolation of DansGuardian and Squid.

I've been playing with CensorNet and like a few of it's features (user/workstation profiles to tailor access scheduling, levels of filtering, etc.) even though it's slightly limited in it's flexibility (port-blocking has four basic modes: allow all, allow web only, allow everything else only, allow web and everything else)

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

By far the best varient of DansGuardian (other than DansGuardian itself) is SmoothGuardian: http://www.smoothwall.net/products/smoothguardian/ - I am pretty sure that the DansGuardian author wrote that too.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

I work at a Christian charity serving the homeless. We use dansguardian and bannerfilter to block inappropriate content and advertising (respectively). Both work very well and quickly with little configuration or maintenance. We are glad to use free, open-source solutions.

While some censorship is bad, individuals should be able to choose their own level of restriction.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

The last time I used DansGuardian was a year ago. I installed it on a state of the art server with a blacklist of about 50k lines (just plain URL-filtering). With the horrifying amount of 60 workstations and rarely more than 30 users logged in at the same time, the server grinded to a halt. I had to convert to SquidGuard, and there was no loss of performance at all.

I recently installed SquidGuard with a blacklist of nearly 700k lines on a similar server with up to 4000 users, producing more than 10 mill. hits a week. The performance penalty is just about 2% CPU power. (BTW, I love the 'squidGuard -u' update with simple diff-files. The 700k lines with updates are reloaded in fractions of a second!)

I have read that DansGuardian has become much more efficient lately, and I would like to hear if somebody has a real life experience on using DansGuardian in something slightly larger than your average home network.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

We use dansguardian with about 100 employees doing some extremely heavy internet usage. we also had problems a year or so ago when we first implemented dansguardian; however, it's greatly improved, and now uses barely perceptible resources for what it's doing (about 10% to 15% of CPU on a single 2.4ghz Xeon, with about 200MB mem usage).

it used to floor a dual 500mhz xeon server, but now it's been heavily tweaked to handle much more traffic. for that matter, computers have become quite a bit cheaper, so it's really not that hard to give it enough CPU and ram to do it's job properly.

since we filter for all sorts of legal compliance related materials, we have a very hefty keyword list, etc. it does the job admirably, with very few false hits. it blocks a lot of stuff, but most people don't complain, and I can only assume that's because they shouldn't be looking up most of it at work!

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

Just changed over to Dansguardian from Squidguard on our school Proxy / internet gateway. The filtering is much better and less restrictive whilst being more effective on stuff that needs blocking. Favorite way of getting porn is to type dirty words into google and click on the links until something comes through. Squidguard allows this (unless you block google), with Dansguardian you don't get the list. But google still works for legitimate searches. It is much more arduous on the proxy server than squidguard and needs more resources. With squidguard the proxy had 256 Meg of Ram and rarely went above 10% CPU load. Now we have 512Meg and the proxy frequently runs at 100% load during busy periods. We get far fewer complaints from the teachers and more from the kids ;-) We have around 400 stations and a busy period constitutes 50 - 100 concurrent surfers. A clever touch is that the loading system will allow stuff through that contains naughty stuff if it contains 'good phrases' (like 'adult education' or 'research' within the web page). It is invariably going to be much harder on the cpu if it is scanning the entire page and 'weighting' it rather than just looking at the url and blocking it or not. We don't use any blacklists at all as we find the standard system works well without them.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

I'm using DG 2.4-6.8. with approx 150 users. No slowdown at all. The only complaining I'm hearing is from people that can no longer get to their My-EBay page because of .dll's being blocked.. Also discovered several PC's infected with Spyware programs that have since been cleaned up.

I'm not using their blacklist since their content scanning does the trick. Can't wait for the 2.6 stable engine to come out.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

Did you try the mailing list? Sounds to me like you had the old glibc regexp bug. See the FAQ question 5 in the

config/usage section.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

My father and mother have a school, here in Monterrey Mexico. the school has aprox 300 students from kindergarden to secondary,(3 years old to 15 years old), the school has 35 computers for student and 15 computers for the personal.

All the computers are conected to internet via a linux gateway.

The internet services has been for two years, but the first two months , we have problem with the internet content.

The school has a internal verbal policy, and the students respect it, but what happen when a student open his hotmail account or yahoo, and there is a a subject email who said "your whatever online order", and there is porn advertise inside?.

So because I installed the linux box , I was looking for a filter content. There are many comercial products outside, like watchguard as harware, and igear as software. but them are expensive, and for now we prefer to invest money in good hardware and choose open source software.And forgot to mencion that they want to charge you monthly or anual fees for their service.

While I was looking, I found squidguard. This software works with squid web cache proxy. But, first Iam not a geek, and I even didnt know anything about a web cache proxy and I found dificult to configure squidguard with squid web cache, because I am some desperate person.

So I looked for another alternative, and there was Dansgurdian. This is the winner.

Just you have to download the squid web cache , install it, then download the dans version, install it and follow the how to in dansguardian website.

This setup works really great. I had some basic problem with configuration becasue my background. But wow this really worth the time I dedicated .

So in that case it was time to change our PII generic linux gateway,file, etc ,server for a really server. Because I told to my father , " the school has saved $$$ dollars in liscense server (thinking in windows servers), and now in the filter content", why not take advantage of the squid cache feature and thus take more efficient the adsl conection.

Then the school invest in a xeon dell server with 2 scsi hard drives.

I can just said that squid and dansguardian is a good option and we and students parents are very happy.

I have in my house testing a squid , dans setup working with pics, wigthed phrase enable and without blacklist enable and filter almost all. just you have to play a little with the conf files, and you will have all the control.

And of course if you want an easy install and configure metod , there is smoothwall (I havent used it) who use dansguardian engine, for a reasonable charge.

In the future we are thinking in having a cheap PIV just firewall box with smoothwall.

And for last, sorry for my english.

Jorge Carrillo

chono@hotmail.com

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

i think your english is fine... i appreciate that you went to the trouble of switching to another language because i would have been lost otherwise. i think i caught the gist of what you were saying.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

you need to speak better english

your an asshole....his

Anonymous's picture

your an a**hole....his english was fine.....you need to not be such a troll....maybe a meteor will fall outof the sky and smash your car to pieces. of course while you are not in it.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

There is some commercial software that uses the DansGuardian engine and has a UI and easy install written by the same author:
http://store.smoothwall.ltd.uk/products/corporateguardian/

It has all the great filtering of DG but comes with commercial support and an easy install. It is also the only other 'offical' commercial or otherwise version of DG other than DG itself. Money made from that goes back into DG development as well!

DanGuardian

Anonymous's picture

DG has two basic ideas behind it. Black lists of urls and active content filtering.

The black list filtering in DG will out perform Squidguard. Dan has written the code to be very fast.

DG's newest release in alpha has fork pooling and is scalable to hundreds of users.

DG also has a webmin module and a module has been written to provide virus scanning of all downloaded http file types.

DG is simply the best.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

Another nice filter is Privoxy. Gets rid of those nasty baner ads, cookies and pop-ups. Runs under Linux and Windows too. Has a easy-to-use web page configurator.

Allows one to selectively block sites, and in fact re-write web pages on-the-fly. My favorite feature is replacing banner ads with a fully transparent image (the default is a checkered image) so the the page layout stays the same and the ad magically vanishes.

Already written: GUILT

Anonymous's picture

An Australian hacker called Zem anticipated this need in 2000 and wrote a tool called GUILT, a free-software censoring proxy that uses a plain-text ban list.

Sadly, the Australian government has failed to respond with details of how one can get a free software censorware filter approved for use in implementing their mandatory filtering policies.

Perhaps someone else can put it to use in another country, preventing a non-free stranglehold on one's internet traffic.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

self controled censorship that actualy works may be a protection to freespeech and the internet, if parents, and individual organizations could apply effective censorship there will be less support for legalizing net censorship.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

Yet another reason for censorship: You don't want to see it. Make a list of all of the things you don't want to see, and put it in the list. Or if you are really aggressive, only put certain things in the whitelist, and block everything else. Why? Maybe for personal moral or religious reasons, maybe because you still don't want pills to enhance your manhood or the world's smallest digital camera. Maybe you are just sick of doubleclick. However, sometimes censorship is a GOOD thing, when you are in control, and censoring content for yourself.

Re: Necessary Censorship: Web Filtering with Open Source

Anonymous's picture

WebCleaner is the thing I use. Filters even JavaScript popups and ads (heh, I can't even tell if this website is using ads - I don't see any ;)

squidGuard in schools

eharrison's picture

We've been using squidGuard in Portland-area schools for the last two years are very happy with the results. Many of our peer agencies in the state of Oregon have implemented squidGuard as well. SquidGuard covers over half the students in the state.

We make our blacklist available via rsync, which dramatically reduces the bandwidth required to keep the lists updated:

rsync squidguard.mesd.k12.or.us::filtering

Plus we have Red Hat packages available:

  • RH7.3: ftp://k12linux.mesd.k12.or.us/apt/7.3/RPMS.k12ltsp/squidGuard-1.2.0-3.i3...
  • RH8.0: ftp://k12linux.mesd.k12.or.us/apt/8.0/RPMS.k12ltsp/squidGuard-1.2.0-4.k1...
  • RH9(beta): ftp://k12linux.mesd.k12.or.us/apt/9/RPMS.k12ltsp/squidGuard-1.2.0-5.k12l...
  • Re: Necessary Censorship: Web Filtering with Open Source

    camoa's picture

    Well, squidgurd has an interesting webmin module around. I'm using it and i'm pretty happy, of course it needs to improve the list management.

    Re: Necessary Censorship: Web Filtering with Open Source

    Anonymous's picture

    Squid will only work on the ports assigned to it, typically http port 80.

    An alternative way is to use TCP Wrappers by Vietse Venema and put the censor list in the /etc/hosts.deny file. That will block everything to/from a malignant host.

    Webinar
    One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

    As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

    Learn More

    Sponsored by Bit9

    Webinar
    Linux Backup and Recovery Webinar

    Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

    In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

    Learn More

    Sponsored by Storix