Bottrap mod_perl

#  Copyright (C) 2002  Andrew Moore [email protected]
#  This program is free software; you can redistribute it and/or
#  modify it under the terms of the GNU General Public License
#  as published by the Free Software Foundation; either version 2
#  of the License, or (at your option) any later version.

#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  GNU General Public License for more details.

package Bottrap;

use 5.006;
use strict;
use warnings;

use Apache::Constants ':common';
use Apache::Log;
use vars qw( %banlist );

our $VERSION = '0.06';

sub handler {
  my $r = shift;
  return DECLINED unless $r->is_initial_req;

  my $btd = $r->dir_config( 'BotTrapDir' );
  return DECLINED unless $btd;

  my $period = $r->dir_config( 'BotTrapTimeout' ) || 600; # number of seconds to keep someone banned.
  my $id = join(".", $r->connection->remote_ip() , $r->header_in('User-Agent') ); 
  my $time = time();

  # if the request is for the honeypot, ban and deny.
  if ( $r->uri() =~ /^$btd/ ) {
    $banlist{ $id } = $time;
    $r->log_reason( 'client accessing bottrap.' );
    return FORBIDDEN;
  } elsif ( $banlist{ $id } ) {
    # if banned alredy, check time before possibly allowing.
    if ( $time - $banlist{ $id } > $period ) {
      delete( $banlist{ $id } );  # just to save memory, I guess.
      $r->warn( 'bottrap: unbanning and allowing.' );
      return DECLINED;   # or is that OK?
    } else {
      $r->log_reason( 'client banned by bottrap.' );
      return FORBIDDEN;  # still banned.

  return DECLINED;   # or is that OK?



=head1 NAME

Bottrap - mod_perl module to trap and deny pesky web spiders.

    PerlModule Bottrap

    PerlSetVar BotTrapDir /bottrap/honeypot
    PerlAccessHandler Bottrap


This mod_perl module can be used to trap web spiders and robots. I think it's good for spiders that disobey your robots.txt, such as e-mail harvesters.

It creates a honeypot directory ("BotTrapDir") in your web server filesysem. When a file in that directory is accessed, the bot is prevented from accessing any files on your web server at all for a period of time (600 by default, or BotTrapTimeout value). The bot is identified by both IP addres and user-agent, so the module does its best to not punish everyone behind a proxy and can still identify bots who spoof their user-agents to look like browsers.

There are two good ways to use this module. One is to put an entry in your robots.txt file to refuse access to some directory for all user-agents. Don't make any links to this directory but make it your bottrap honeypot. The worst kinds of bots will decide that that directory should be harvested for e-mail addresses and get banned.

Another method is to make an invisible link on some pages to your honeypot. This is a link whose "href" attribute goes to the honeypot, but has no text (or image) to click on. A normal browser will not see this link, but a spider (probably) will and get caught in the honeypot.

=head2 EXPORT

None by default.

=head1 AUTHOR

Andrew Moore, <[email protected]>

=head1 SEE ALSO

Idea from article: and from a hosting service, directnic, I think.