Necessary Censorship: Web Filtering with Open Source
You're the administrator of a
cash-strapped school system and received a note saying you'll
finally be able to get the school connected to the Net--as soon as
you have a plan to comply with CIPA(1). Or you're out in Corporate
America, and when the boss typo-ed a URL, she saw some very
interesting pictures on her screen. Or you're simply Joe
Penguinhead at home, having had the talk with the spousal unit, and
you've decided it's time for Junior to have a computer of his very
own. In short, you're now stuck with committing censorship.In the course of doing research for this article, I ran
across pieces from EFF and Peacefire (plus one e-mailed lecture)
saying that all censorship is bad that we should simply educate our
children and coworkers on responsible cyber-surfing, blah, blah,
blah.... It's true, and in an ideal world, we could do that.
Unfortunately, the world is not populated by only responsible
adults and well-educated children. And everyone makes a typo once
in a while. Thus, we are forced to do something about it.The Children's Internet Protection Act mandates that a school
or library must have an Internet safety policy, must hold a public
review of that policy and must use a "technology protection
measure" on all computers connected to the Internet. Whether and
how that software can be disabled on certain computers is a local
decision. It does not mandate that software be perfect; indeed,
many web pages, both administrative and commercial, emphasized that
filtering would not be perfect. More on that in a minute.Corporate firewalls often are quite a bit more aggressive.
One colleague I spoke to said his employer didn't block only the
naughty bits, but sports, third-party e-mail providers, web comics,
in short, almost everything anything that wasn't
work-related.Then, there are those of us stuck on slow links who simply
would like to not have third parties like Doubleclick fouling our
bandwidth. This group probably contains a lot more of us than
people might think.The problem with most censorware--aside from the cost and the
fact that it more than likely is written for platforms that people
reading this would rather not be running--is one of control.
Because the software is proprietary, not only do you not have
control over what it is you're blocking, you don't even know what's
on the blacklist. As I write this article, there is an ongoing
lawsuit in Pennsylvania regarding free-speech advocates' ability to
access the state's blacklist.Some situations may call for more blocking, while other
require less, but normally, no provision is in place about how to
get the list changed. Two vendors do allow you to submit a URL for
review: N2H2 and Dan's Guardian. While N2H2 does not publicize
their entire list, they do have a URL checker. Dan's is even more
open, but I'm getting ahead of myself.So, I said to myself, "Self, if you can't beat them, perhaps
it's time to join them." Maybe we need open-source censorware,
strange as that may sound, with a publicly available list. It would
offer the ability to tinker with both the code and the list to suit
the needs of folks who have to do this type of work.I was stunned by the answer I found: two such animals already
are available. One is Dan's Guardian, which I mentioned above; the
other is squidGuard, a plug-in for the Squid web proxy. Squid and
squidGuard are offered under the GPL, and they are free as in beer
as well. I'm getting some funny looks, I know; you'll see why in a
minute. Both are apt-gettable for Debian fans. Mandrake folks can
get them from the club site; or, do as Red Hat folks have to do,
and compile it from source.One of the items in squidGuard's contrib directory is
squidGuardRobot, a spider that goes out and analyzes newly accessed
web sites for content and then refreshes the blacklist. Because
squidGuard is under the GPL, and you can put whatever blacklist in
you want, you have complete control over how filtering works at the
usual cost of maintenance. The
Open Source Directory
offers a plethora of free blacklists that work with squidGuard,
arranged according to any category in their directory structure and
by content rating (roughly equivalent to G, PG, PG-13 and so
on).Now, we come to Dan's Guardian. Dan's Guardian comes with an
interesting licensing setup. It's GPL, which means the Debian folks
have zero issues with putting it in their distributions, but it
uses the clause in the GPL that allows a vendor to charge for GPL
software. The web site says the scheme has been vetted personally
by RMS as legitimate use of the GPL; it also passes this author's
understanding thereof, for what it's worth. The blacklist is
subscription-based, but free for trial use. A form on the blacklist
download page allows users to add URLs to both the whitelist and
blacklist, and, further down, feedback on recently submitted
URLs.Whereas squidGuard and other censorware, except Symantec's
i-Gear, work on a simple URL list or URL regular expressions, Dan's
Guardian actually looks at the content of the web page on the fly,
scanning for words and phrases that meet the criteria for
blacklisting or whitelisting. You also can use your squidGuard
blacklists with Dan's Guardian, which means all that DMOZ stuff
works here as well. Dan's Guardian works as a proxy plug-in with
Squid the same way squidGuard does. It also works as a plug-in for
Oops, another, lighter-weight web proxy.Dan's Guardian is a cheap way to be CIPA compliant without
having to worry about it a whole lot. The software is free for
non-commercial use and an educational rate subscription for a
once-monthly download of the actively maintained blacklist is
$5/month or $60/year. I understand free software, but I'm far more
interested in having it be free as in speech--the blacklist comes
down in readable format--than free as in beer. I'm also not opposed
to paying reasonable prices for good software.
If you want to do both kinds of free, squidGuard is there. Be my
guest; I'll likely join you. But there's something to be said for a
cheap way that a busy librarian responsible can take care of the
computers and not have to worry about what Johnny's going to see or
what his mom is going to say about it.Some of you probably will point me at Privoxy, the
Sourceforge project that grew out of Junkbuster. While it's a great
way to get rid of the ads and the cookies and the pop-ups, you'd
have to convert the squidGuard-type blacklists over to Privoxy's
format every time a new list came out, a less-than-efficient use of
time and resources. The bandwidth suck on new lists alone is
considerable--6MB for a typical one. Although corporate OC-class
folk might think this is trivial, believe me, on 56K, it's
decidedly painful.So, there you go. It is possible to commit censorship in a
totally GPL fashion so not only do you know what it is you're
censoring, but so you can control it to your heart's content.
Although open-source advocates generally consider arbitrary,
proprietary censorship to be a bad thing--its alternative being one
of the reasons behind the Open Source Movement--controlling what
comes into your computer and network wisely and with an open mind
is a good thing. After all, the big reason this author runs Linux
is so that he, himself, controls what does and does not happen on
his computers.Glenn Stone is a Red Hat
Certified Engineer, sysadmin, technical writer, cover model and
general Linux flunkie. He has been hand-building computers for fun
and profit since 1999, and he is a happy denizen of the Pacific
Northwest.
email: liawol.org!gs










This week 5 lucky Members will receive a copy of The Official Ubuntu Server Book by Benjamin Mako Hill and Linux Journal's very own Kyle Rankin. No entry necessary. Check back here early next week to find out who the lucky Online Members are.




Comments
No Mention of Untangle??
It should also be mentioned that Untangle offers a very comprehensive content filter that is free and also offers extra apps for purchase. My school has been using it for the past 8 months and we have been very happy with its results. The basic web filter is free and meets CIPA requirements. It also filters for Spyware, Virus's, and Phishers as well as blocks MIME types or protocols such as online gaming or download apps such as Limewire or Bit Torrent. Users can also be granted access rights for specific times to specific sites such as internet turns off altogether at x-time or facebook and myspace are only available at specific times.
I was surprised the author did not mention Untangle. I hope this helps someone it might be worth looking into.
Looking for specific open source software
Hello! Thanks so much for the suggestions! I'm fairly computer saavy, but I'm also not familiar with all the different types of languages, so sometimes I find it difficult putting in terms and words to find exactly what I'm looking for; so I'm hoping you would have some suggestions.
Let me explain the situation:
My mother and father-in-law have a granddaughter that they're raising, and she has frequently abused computer priveledges. So they have grounded her off things like facebook, and Windows Live, etc. She will then be all sweet and loving, and my mother-in-law will let her on the computer to "do her homework" or "play neopets" and she figures she will just check in on her granddaughter to make sure she's doing what she's doing. I will find out later (by searching the internet, or by checking their computer next time I'm over there), that my neice will have gone on to facebook anyway, or chatted it up with some friends, made a website, etc anyway. And then she will lie about it, digging her grave even further by getting grounded off it even longer. She will watch her grandmother put in the password to get her on the computer in the first place, and since my grandmother is a slow typer, and my neice is sneaky, she will find the password, and then get up late at night (like 3am) to go on the computer. It's quite the problem, and I need a program where if my mother-in-law lets my neice on the computer, she doesn't have to worry about her chatting or making webpages, etc. My neice is falling behind in school because she gets so distracted by things on the internet.
So is there a program that can block out her from using Windows Live by requiring a password, and at the same time allow my mother-in-law into windows live to talk to her relatives? They don't want to uninstall windows live because they like to use it too. And also, is there a program that I can manually block the sites out (like facebook). Currently I use the content advisor that Internet Explorer has, but because my in-laws are so forgetful, I make it the same password as the login for windows. Plus, my in-laws always forget to log out so then my neice will sneak on there when they go to bed. I will set it so that as soon as the screen saver comes on, a password is required to get back in, but she will simply take that off as soon as she gets the chance.
I hope this kind of explains what I'm looking for! My in-laws just know NOTHING about computers and it makes me so mad that my neice uses that to her advantage!
Excellent Virus protection
There is a patch that, when applied to the DansGuardian filer, performs virus protection. So, for those who don't want to censor words, but just want to be capable of browsing the internet without the threat of some site or webmail message downloading a virus or spyware, DansGuardian is still a "must get".
To date, the only alternatives to DansGuardian with the antivirus patch are commercial and can cost thousands of dollars for a "license".
Re: Necessary Censorship: Web Filtering with Open Source
CensorNet (http://www.intrago.co.uk/products/censornet.php) is an interesting extrapolation of DansGuardian and Squid.
I've been playing with CensorNet and like a few of it's features (user/workstation profiles to tailor access scheduling, levels of filtering, etc.) even though it's slightly limited in it's flexibility (port-blocking has four basic modes: allow all, allow web only, allow everything else only, allow web and everything else)
Re: Necessary Censorship: Web Filtering with Open Source
By far the best varient of DansGuardian (other than DansGuardian itself) is SmoothGuardian: http://www.smoothwall.net/products/smoothguardian/ - I am pretty sure that the DansGuardian author wrote that too.
Re: Necessary Censorship: Web Filtering with Open Source
I work at a Christian charity serving the homeless. We use dansguardian and bannerfilter to block inappropriate content and advertising (respectively). Both work very well and quickly with little configuration or maintenance. We are glad to use free, open-source solutions.
While some censorship is bad, individuals should be able to choose their own level of restriction.
Re: Necessary Censorship: Web Filtering with Open Source
The last time I used DansGuardian was a year ago. I installed it on a state of the art server with a blacklist of about 50k lines (just plain URL-filtering). With the horrifying amount of 60 workstations and rarely more than 30 users logged in at the same time, the server grinded to a halt. I had to convert to SquidGuard, and there was no loss of performance at all.
I recently installed SquidGuard with a blacklist of nearly 700k lines on a similar server with up to 4000 users, producing more than 10 mill. hits a week. The performance penalty is just about 2% CPU power. (BTW, I love the 'squidGuard -u' update with simple diff-files. The 700k lines with updates are reloaded in fractions of a second!)
I have read that DansGuardian has become much more efficient lately, and I would like to hear if somebody has a real life experience on using DansGuardian in something slightly larger than your average home network.
Re: Necessary Censorship: Web Filtering with Open Source
We use dansguardian with about 100 employees doing some extremely heavy internet usage. we also had problems a year or so ago when we first implemented dansguardian; however, it's greatly improved, and now uses barely perceptible resources for what it's doing (about 10% to 15% of CPU on a single 2.4ghz Xeon, with about 200MB mem usage).
it used to floor a dual 500mhz xeon server, but now it's been heavily tweaked to handle much more traffic. for that matter, computers have become quite a bit cheaper, so it's really not that hard to give it enough CPU and ram to do it's job properly.
since we filter for all sorts of legal compliance related materials, we have a very hefty keyword list, etc. it does the job admirably, with very few false hits. it blocks a lot of stuff, but most people don't complain, and I can only assume that's because they shouldn't be looking up most of it at work!
Re: Necessary Censorship: Web Filtering with Open Source
Just changed over to Dansguardian from Squidguard on our school Proxy / internet gateway. The filtering is much better and less restrictive whilst being more effective on stuff that needs blocking. Favorite way of getting porn is to type dirty words into google and click on the links until something comes through. Squidguard allows this (unless you block google), with Dansguardian you don't get the list. But google still works for legitimate searches. It is much more arduous on the proxy server than squidguard and needs more resources. With squidguard the proxy had 256 Meg of Ram and rarely went above 10% CPU load. Now we have 512Meg and the proxy frequently runs at 100% load during busy periods. We get far fewer complaints from the teachers and more from the kids ;-) We have around 400 stations and a busy period constitutes 50 - 100 concurrent surfers. A clever touch is that the loading system will allow stuff through that contains naughty stuff if it contains 'good phrases' (like 'adult education' or 'research' within the web page). It is invariably going to be much harder on the cpu if it is scanning the entire page and 'weighting' it rather than just looking at the url and blocking it or not. We don't use any blacklists at all as we find the standard system works well without them.
Re: Necessary Censorship: Web Filtering with Open Source
I'm using DG 2.4-6.8. with approx 150 users. No slowdown at all. The only complaining I'm hearing is from people that can no longer get to their My-EBay page because of .dll's being blocked.. Also discovered several PC's infected with Spyware programs that have since been cleaned up.
I'm not using their blacklist since their content scanning does the trick. Can't wait for the 2.6 stable engine to come out.
Re: Necessary Censorship: Web Filtering with Open Source
Did you try the mailing list? Sounds to me like you had the old glibc regexp bug. See the FAQ question 5 in the
config/usage section.
Re: Necessary Censorship: Web Filtering with Open Source
My father and mother have a school, here in Monterrey Mexico. the school has aprox 300 students from kindergarden to secondary,(3 years old to 15 years old), the school has 35 computers for student and 15 computers for the personal.
All the computers are conected to internet via a linux gateway.
The internet services has been for two years, but the first two months , we have problem with the internet content.
The school has a internal verbal policy, and the students respect it, but what happen when a student open his hotmail account or yahoo, and there is a a subject email who said "your whatever online order", and there is porn advertise inside?.
So because I installed the linux box , I was looking for a filter content. There are many comercial products outside, like watchguard as harware, and igear as software. but them are expensive, and for now we prefer to invest money in good hardware and choose open source software.And forgot to mencion that they want to charge you monthly or anual fees for their service.
While I was looking, I found squidguard. This software works with squid web cache proxy. But, first Iam not a geek, and I even didnt know anything about a web cache proxy and I found dificult to configure squidguard with squid web cache, because I am some desperate person.
So I looked for another alternative, and there was Dansgurdian. This is the winner.
Just you have to download the squid web cache , install it, then download the dans version, install it and follow the how to in dansguardian website.
This setup works really great. I had some basic problem with configuration becasue my background. But wow this really worth the time I dedicated .
So in that case it was time to change our PII generic linux gateway,file, etc ,server for a really server. Because I told to my father , " the school has saved $$$ dollars in liscense server (thinking in windows servers), and now in the filter content", why not take advantage of the squid cache feature and thus take more efficient the adsl conection.
Then the school invest in a xeon dell server with 2 scsi hard drives.
I can just said that squid and dansguardian is a good option and we and students parents are very happy.
I have in my house testing a squid , dans setup working with pics, wigthed phrase enable and without blacklist enable and filter almost all. just you have to play a little with the conf files, and you will have all the control.
And of course if you want an easy install and configure metod , there is smoothwall (I havent used it) who use dansguardian engine, for a reasonable charge.
In the future we are thinking in having a cheap PIV just firewall box with smoothwall.
And for last, sorry for my english.
Jorge Carrillo
chono@hotmail.com
Re: Necessary Censorship: Web Filtering with Open Source
i think your english is fine... i appreciate that you went to the trouble of switching to another language because i would have been lost otherwise. i think i caught the gist of what you were saying.
Re: Necessary Censorship: Web Filtering with Open Source
you need to speak better english
your an asshole....his
your an a**hole....his english was fine.....you need to not be such a troll....maybe a meteor will fall outof the sky and smash your car to pieces. of course while you are not in it.
Re: Necessary Censorship: Web Filtering with Open Source
There is some commercial software that uses the DansGuardian engine and has a UI and easy install written by the same author:
http://store.smoothwall.ltd.uk/products/corporateguardian/
It has all the great filtering of DG but comes with commercial support and an easy install. It is also the only other 'offical' commercial or otherwise version of DG other than DG itself. Money made from that goes back into DG development as well!
DanGuardian
DG has two basic ideas behind it. Black lists of urls and active content filtering.
The black list filtering in DG will out perform Squidguard. Dan has written the code to be very fast.
DG's newest release in alpha has fork pooling and is scalable to hundreds of users.
DG also has a webmin module and a module has been written to provide virus scanning of all downloaded http file types.
DG is simply the best.
Re: Necessary Censorship: Web Filtering with Open Source
Another nice filter is Privoxy. Gets rid of those nasty baner ads, cookies and pop-ups. Runs under Linux and Windows too. Has a easy-to-use web page configurator.
Allows one to selectively block sites, and in fact re-write web pages on-the-fly. My favorite feature is replacing banner ads with a fully transparent image (the default is a checkered image) so the the page layout stays the same and the ad magically vanishes.
Already written: GUILT
An Australian hacker called Zem anticipated this need in 2000 and wrote a tool called GUILT, a free-software censoring proxy that uses a plain-text ban list.
Sadly, the Australian government has failed to respond with details of how one can get a free software censorware filter approved for use in implementing their mandatory filtering policies.
Perhaps someone else can put it to use in another country, preventing a non-free stranglehold on one's internet traffic.
Re: Necessary Censorship: Web Filtering with Open Source
self controled censorship that actualy works may be a protection to freespeech and the internet, if parents, and individual organizations could apply effective censorship there will be less support for legalizing net censorship.
Re: Necessary Censorship: Web Filtering with Open Source
Yet another reason for censorship: You don't want to see it. Make a list of all of the things you don't want to see, and put it in the list. Or if you are really aggressive, only put certain things in the whitelist, and block everything else. Why? Maybe for personal moral or religious reasons, maybe because you still don't want pills to enhance your manhood or the world's smallest digital camera. Maybe you are just sick of doubleclick. However, sometimes censorship is a GOOD thing, when you are in control, and censoring content for yourself.
Re: Necessary Censorship: Web Filtering with Open Source
WebCleaner is the thing I use. Filters even JavaScript popups and ads (heh, I can't even tell if this website is using ads - I don't see any ;)
squidGuard in schools
We've been using squidGuard in Portland-area schools for the last two years are very happy with the results. Many of our peer agencies in the state of Oregon have implemented squidGuard as well. SquidGuard covers over half the students in the state.
We make our blacklist available via rsync, which dramatically reduces the bandwidth required to keep the lists updated:
rsync squidguard.mesd.k12.or.us::filtering
Plus we have Red Hat packages available:
Re: Necessary Censorship: Web Filtering with Open Source
Well, squidgurd has an interesting webmin module around. I'm using it and i'm pretty happy, of course it needs to improve the list management.
Re: Necessary Censorship: Web Filtering with Open Source
Squid will only work on the ports assigned to it, typically http port 80.
An alternative way is to use TCP Wrappers by Vietse Venema and put the censor list in the /etc/hosts.deny file. That will block everything to/from a malignant host.
Post new comment