A System Monitoring Dashboard

This simple set of shell scripts keeps you informed about disks that are filling up, CPU-hog processes and problems with the Web and mail servers.

For about a year, my company had been struggling to roll out a monitoring solution. False positives and inaccurate after-hours pages were affecting morale and wasting system administrators' time. After speaking to some colleagues about what we really need to monitor, it came down to a few things:

  • Web servers—by way of HTTP, not only physical servers.

  • Disk space.

  • SMTP servers' availability—by way of SMTP, not only physical servers.

  • A history of these events to diagnose and pinpoint problems.

This article explains the process I developed and how I set up disk, Web and SMTP monitoring both quickly and simply. Keeping the monitoring process simple meant that all the tools used should be available on a recent Linux distribution and should not use advanced protocols, such as SNMP or database technology. As a result, all of my scripts use the Bash shell, basic HTML, some modest Perl and the wget utility. All of these monitoring scripts share the same general skeleton and installation steps, and they are available from the Linux Journal FTP site (see the on-line Resources).

Installing the scripts involves several steps. Start by copying the script to a Web server and making it world-executable with chmod. Then, create a directory under the root of your Web server where the script can write its logs and history. I used webmon for monitor_web.sh. The other scripts are similar: I used smtpmon for monitor_smtp.sh and stats for monitor_stats.pl. monitor_disk.sh is different from the others because it is the only one installed locally on each server you want to monitor.

Next, schedule the scripts in cron. You can run each script with any user capable of running wget, df -k and top. The user also needs to have the ability to write to the script's home. I suggest creating a local user called monitor and scheduling these through that user's crontab. Finally, install wget if it is not already present on your Linux distribution.

My first challenge was to monitor the Web servers by way of HTTP, so I chose wget as the engine and scripted around it. The resulting script is monitor_web.sh. For those unfamiliar with wget, its author describes it as “a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely used Internet protocols” (see Resources).

After installation, monitor_web.sh requires only two choices for the user, e-mail recipient and URLs to monitor, which are labeled clearly. The URLs must conform to HTTP standards and return a valid http 200 OK string to work. They can be HTTP or HTTPS, as wget and monitor_web.sh support both. Once installed and run the first time, the user is able to get to http://localhost/webmon/webmon.html and view the URLs, the last result and the history in a Web browser, as they all are links.

Now, let's break down the script; see monitor_web.sh, available on the LJ FTP site. First, I set all the variables for system utilities and the wget program. These may change on your system. Next, we make sure we are on the network. This ensures that if the server monitoring the URLs goes off-line, a massive number of alerts are not queued up by Sendmail until the server is back on-line.

As I loop through all the URLs, I have wget connect two times with a timeout of five seconds. I do this twice to reduce false positives. If the Web site is down, the script generates an e-mail message for the recipient and updates the Web page. Mail also is sent when the site is back up. The script sends only one message, so we don't overwhelm the recipient. This is achieved with the following code:


wget $URL  2>>   $WLOG

if (( $? != 0 ));then

  echo \<A HREF\="$URL"\>$URL\<\/A\> is down\

          $RTAG $EF.\
          $LINK Last Result $LTAG  >> $WPAGE

  if [[ ! -a down.$ENV ]];then

        touch /tmp/down.$ENV
        mail_alert down

  else
         echo Alert already sent for $ENV \
         - waiting | tee -a $WLOG
  fi

fi

I have included the HTML for green and red text in the script, if you choose not to use graphics. Again, the full script is available from the Linux Journal FTP site.

Figure 1. monitor_web.sh in action. Run the script from cron to regenerate this page as often as needed.

With the Web servers taken care of, it was time to tackle disk monitoring. True to our keep-it-simple philosophy, I chose to create a script that would run from cron and alert my team based on the output of df -k. The result was monitor_disk.sh. The first real block of code in the script sets up the filesystems list:


FILESYSTEMS=$(mount | grep -iv proc |\
grep -iv cdrom | awk'{print $3}')

I ignore proc and am careful not to report on the CD-ROM, should my teammates put a disk in the drive. The script then compares the value of Use% to two values, THRESHOLD_MAX and THRESHOLD_WARN. If Use% exceeds either one, the script generates an e-mail to the appropriate recipient, RECIPIENT_MAX or RECIPIENT_WARN. Notice that I made sure the Use% value for each filesystem is interpreted as an integer with this line:


typeset -i  UTILIZED=$(df -k $FS | tail -1 | \
awk '{print $5}' | cut -d"%" -f1)

A mailing list was set up with my team members' e-mail addresses and the e-mail address of the on-call person to receive the critical e-mails and pages. You may need to do the same with your mail server software, or you simply can use your group or pager as both addresses.

Because our filesystems tend to be large, about 72GB–140GB, I have set critical alerts to 95%, so we still have some time to address issues when alerted. You can set your own threshold with the THRESHOLD_MAX and THRESHOLD_WARN variables. Also, our database servers run some disk-intensive jobs and can generate large amounts of archive log files, so I figured every 15 minutes is a good frequency at which to monitor. For servers with less active filesystems, once an hour is enough.

Our third script, monitor_smtp.sh, monitors our SMTP servers' ability to send mail. It is similar to the first two scripts and simply was a matter of finding a way to connect directly to a user-defined SMTP server so I could loop through a server list and send a piece of mail. This is where smtp.pl comes in. It is a Perl script (Listing 1) that uses the NET::SMTP module to send mail to an SMTP address. Most recent distributions have this module installed already (see the Do I Have That Perl Module Installed sidebar). Monitor_smtp.sh updates the defined Web page based on the success of the transmission carried out by smtp.pl. No attempt is made to alert our group, as this is a trouble-shooting tool and ironically cannot rely on SMTP to send mail if a server is down. Future versions of monitor_smtp.sh may include a round-robin feature and be able to send an alert through a known working SMTP server.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Nagios

Will_'s picture

I believe Nagios already does this.

Also checking the page ...

ecazarez's picture

We have also scripts for monitoring a web server. Additionally to know that the web service is up and running we were interested in knowing that the page haven't had changes, so we did this:

- Precalculated and stored de md5 from the page we want to check
- Every few minutes (crontab line), get the page and calculate the md5 for the page we get and compare it with the precalculated. If they are different, there is a problem.

The calculus is an onliner:


fetch -q -o - http://www.thepage.tocheck | md5

Also, to avoid the problem of installing some modulus in Perl to send email, we use this:


sub send_mail {
# Send email messages using the sendmail process
# parameters:
# $from_address, addres to use in From field
# $subject, Subject for the message
# $body, Message body
# $to_address, address(es) destination for the message
my ( $from_address, $subject, $body, $to_address) = @_;
open(SENDMAIL, "|/usr/lib/sendmail -t") or warn "Cannont open sendmail: $!\n";
print SENDMAIL <<MESSAGE;
To: $to_address
From: $from_address
Subject: $subject
$body
.

MESSAGE
close SENDMAIL;
}

So, not having the module is not a problem for sending emails :)

Sendmail

jcoyle's picture

How would you include you sendmail script in the web_monitor.sh ?

script errors

GZILL's picture

I get a ": bad interpreter" error when running monitor_web.sh after having configured web servers, email addy and correcting the path to wget. Please advise.

line endings

Russ's picture

I had to fix the line endings of the script after downloading it to Windows XP and moving the tgz file to my Debian box. Try

tr -s "\r" "\n" < monitor_web.sh > b && mv b monitor_web.sh;

You'll have to re-enable the executable bit.

bash or ksh ?

Anonymous's picture

is the first line pointing to bash or ksh?
make sure its bash if using linux as most distros don't have ksh by default.

permissions ok?

Anonymous's picture

have you set perms to 755 on the script?

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix