Automating Firewall Log Scanning

Techniques and scripts for automating scanning of log files produced by ipchains.
Data Display Loop

First we display a nice-looking web page header, as shown in Listing 2.

Listing 2. Web Page Header

Loop over the sorted source IP addresses and print the source IP address, the number of packets coming from that IP and the traffic (in bytes) generated from that IP:

for (sort keys %source) {
  print "<TR><TD>$_</TD> ";
  print "<TD>$source{$_} </TD>\n";
  print "<TD>$traffichost{$_} bytes</TD>\n";

Now we are able to print the string containing the destination IP addresses contacted by the current source IP address:

$tmp1 = $sourcedest{$_};
if (length($tmp1) gt 0) {
  print "<TD>\n";
  @lt1 = split " ", $tmp1;
  for(sort @lt1) {
    printf "$_ <br>\n";
  print " </TD>\n";
print " </TR>\n";
Finally, we print the HTML tail:
print "</TABLE>\n";
print "</center>\n";
print "</BODY></HTML>\n";

The Downloadable inside-control Script

The version of inside-control I actually implemented is richer in functionality than the one presented here. You can download the script from Some of the main added features include the ability to display arbitrary names instead of IP addresses in the “Source IP” column. This is done with a very simple text database that maps IP numbers to names. The format is the same as the /etc/hosts file, and you can use that file if it is meaningfully configured for your internal LAN. The exact location of the “IP to names” database file can be specified by changing the relevant variable ($useripdb) at the beginning of the script.

There is also a search facility that allows one to look for a particular source IP address (or corresponding name found in the “IP to names” database) in the logs. The search form is displayed whenever the CGI is called without arguments from the browser. Arguments passing is done by the GET method.

Additionally, the main loop includes some data validation (the kernel cannot always log properly, especially on low RAM or low-spec CPUs) and some storage of port-dependent information.

Finally, the script can also be called without the web interface. Just pass any argument to inside-control, and all HTML output will be suppressed and some normal output will be provided instead. A search string for a source IP address (or its corresponding name found in the “IP to names” database) can be passed to the program via the -t option.

Notes and Caveats

The purpose of this article is to explain some design principles and give some hints, not to give a prepackaged solution to log scanning problems. There are many areas where the inside-control script can be made better, such as performance and security. The following are some notes about inside-control, mostly related to security issues.

In order for a CGI to read the computer log files /var/log/syslog or /var/log/messages, these have to be made readable by all. This can be accomplished with the command chmod +r /var/log/syslog. This, however, is not very secure as it gives anybody on the system permission to read the computer log files. It would be much better to get the web server to run inside-control with a particular group permission, and then make the log files belong to that group.

After reading the article, one could conclude it is essential that a firewall also runs a web server, as inside-control needs to read the firewall log files. In fact, putting a web server on a firewall is very insecure: ideally a firewall should run no dæmon service, and all maintenance should be done at the console. When there is a need for remote administration, the only service that may be installed on the firewall is ssh, the secure shell. Running inside-control is still possible by setting up a separate web server within the internal network that also acts as a syslog server for the firewall.

Firewall logs can fill up a partition pretty quickly. In order to avoid having a clogged hard disk on the firewall (which could lead to a malfunctioning internet connection), depending on the amount of traffic you want to log, you have to allow for a large log file space. For high data volume services (typically HTTP, FTP, SMTP, NetBIOS, LPD and database services) I would advise setting up a second hard disk of at least 20GB in size, with just one partition mounted on /var/log. Also keep in mind that the script needs some error-checking code on critical steps like opening a file.

Finally, there is a lot of room for improvement everywhere in the script and especially in the main loop. One can use much more data from each log line than is discussed here. However, it is always a good idea to not show too many details; otherwise, the whole point of having an automated log scanner is defeated. If you display all available details, you end up having to look for suspicious entries in an unmanageably high volume of traffic log.

Leo Liberti is technical director at IrisTech in Como, Italy, a firm that supplies its customers with web-based applications and all kinds of electronic services. His free time is dedicated to eating in as many restaurants as possible.