Automating Firewall Log Scanning

Techniques and scripts for automating scanning of log files produced by ipchains.
ipchains Log Format

Let us now examine a sample log entry from our firewall's /var/log/syslog:

Jun 12 16:15:54 myfirewall kernel: Packet log: input DENY eth1 PROTO=6 L=52 S=0x10 I=24016 F=0x4000 T=53 SYN (#38)

This means that at quarter past four in the afternoon on 12 June, the firewall (called, rather boringly, myfirewall) denied and logged a packet coming into its network interface eth1 (the external interface on the Internet) with the TCP protocol coming from (from port 34251), directed to (on port 23, i.e., the Telnet port) and having a length of 52 bytes. We shall skip most of the other details, apart from one: “SYN” means that the packet is the first packet of a connection. In practice, this information is very useful in discriminating those packets that are part of a pre-existing connection (that might have been initiated from the internal LAN) and those packets that attempt to establish a connection from the Internet towards the internal LAN. Usually one allows “reply” packets (which do not have the “SYN” flag set) but denies “SYN” packets because it means somebody out there is trying to make a connection to a computer in the internal LAN.

Of course, it is possible to check the status of a firewall by inspecting all relevant entries in the log file, but this is feasible if one logs only a few strange-looking packets. For example, on some firewalls I set up I decided to log all those packets coming from the Internet towards port 31337 on computers on the internal LAN, as 31337 is the default port BackOrifice uses. Whenever one is interested in getting some statistics from the firewall, it is likely that the size of the log file will be in excess of 5MB per day. In such cases, inspection of the log file by hand is no longer an option. This is when automated log scanning comes in.

When analyzing 2.4.x kernel firewall logs, the format is different:

Jun 12 16:15:54 myfirewall kernel: Packet log: IN=eth1 OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC= DST= LEN=52 TOS=0x10 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=34251 DPT=23 WINDOW=11592 RES=0x00 SYN URGP=0

The fields we are interested in are SRC (source IP address), DST (destination IP address), SPT (source port), DPT (destination port) and the presence or absence of the SYN flag.

The inside-control Script Structure

I am going to use Perl to build the log scanner. It is not the only option and, in fact, in order to achieve top performance one should use a compiled language. When I recoded this script in C++, I observed an execution speed gain of 100%.

The inside-control script is composed of a main parsing loop and an HTML data display loop. Since the script is a CGI it needs to reside on a web server configured for running CGI programs.

Note that the code, as described below, sacrifices functionality and useful details like error-checking for clarity. For example, there is no check that “opening a file” was successful before actually reading that file. Note also that the code below is customized to analyze the packet-logging format of kernels 2.2.x. Changing to the logging format of kernels 2.4.x, on the basis of the sample packet log described above, should be straightforward.

Main Parsing Loop

First, we open the log file and initialize some variables (those with Red Hat should use /var/log/messages instead of /var/log/syslog):

open(LOGFILE, "/var/log/syslog");
$firstdate = "";
$date = "";
$total_traffic = 0;

Now we loop over each line in the log file:

while ( <LOGFILE> ) {
Skip all log entries which do not belong to the firewall:
next unless /Packet log/;
We also parse the line (warning: in the Perl script, write the last line in this chunk as a whole long line, without the backslash):
@log = split;
($month,$day,$time,$policy,$proto,$ipsource,$ipdest, \
$tot_len) = @log[0,1,2,8,10,11,12,13];
We then calculate the date and store the first date in the log. As we go on, we store the current date as the last date, so that after the last step the variable lastdate will contain the last date in the log:
$date = $day . " " . $month . " " . $time;
if (length($firstdate) == 0) {
  $firstdate = $date;
$lastdate = $date;
Read the protocol type, the source IP address, the source port, the destination IP address, the destination port and the packet length:
$proto = substr($proto, -1);
($ips, $ports) = split ":", $ipsource;
($ipd, $portd) = split ":", $ipdest;
($flush, $packetlen) = split "=", $tot_len;
Now record the destination IP address in a string, and associate that string to the source IP address so that in the data display loop we will be able to loop over source IP addresses and retrieve the hosts they connected to:
unless ( $sourcedest{$ips} =~ /$ipd/ ) {
  $sourcedest{$ips} = $sourcedest{$ips} . $ipd . " ";
We count the log entries for the source IP address:
and sum up the total traffic volume:
$total_traffic += $packetlen;
Finally, we sum up the per-host traffic volume:
$traffichost{$ips} += $packetlen;
Notice that not all the information gathered has been used (no talk of ports, for example), so there is plenty of room for expansion here.