Monitoring E-Mail with Nagios
Have you ever felt like you were being ignored? Have you ever felt like you were talking but no one was listening? Well, that's how it feels when your e-mail system is broken and you don't know it.
During the past week, I've had a couple system problems that prevented people from receiving e-mail messages that my wife or I sent. The sad part was that we didn't know the messages weren't being delivered. We'd receive a message asking a question, and we'd reply to the sender thinking nothing of it. A few days later, we'd get a phone call from the person asking whether we ever were going to respond.
In our case, two situations were conspiring against us: a change in Comcast's firewall policy and a change in Yahoo's mail delivery policy.
It all began when my wife started complaining that something was wrong with the e-mail system because she'd not heard back from a friend whom she had sent a message the previous day. I sent a quick e-mail to a friend of mine, got a response, and informed my wife that “it worked for me,” and chalked it up to her friend not being responsive.
Then, just to demonstrate to her that the mail server was healthy, I asked the server to print out its mail queue. Crap! There were 55 messages in the queue waiting to be delivered. Of course, by this time, even I had noticed that the volume of incoming spam had gone down to none. So, Houston, we had a problem.
After several years of running my own mail server on my home machine connected to the Internet via Comcast, Comcast decided to implement a new firewall policy and started blocking incoming SMTP (tcp/25) connections on its residential users' networks. Of course, I wasn't informed of the change, because I don't use Comcast's e-mail system! Previously, we would send e-mail from our workstations, and our mail server would forward the message through Comcast's smarthost; incoming messages came directly to our server. This configuration had worked for years. But, with the new firewall policy, something broke. Some of our messages were being delivered, and some weren't. I'm speculating that the ones not delivered were going through servers that did sending address verification, and as they couldn't connect back to my mail server to validate my e-mail address, they refused delivery.
So, I decided to take the inexpensive way out. I could have spent an extra $20 a month and gotten a business account with Comcast, which I eventually did, but I didn't at first. I created a VPN tunnel from my home machine to one of my servers on the open Internet. Then, I moved my DNS pointers to point to that machine and had it forward incoming messages through the VPN. I configured my home server to use that machine as its smarthost rather than Comcast's server. Aside from the blatant violation of Comcast's Acceptable Use Policy, this seemed like it would work pretty well.
Then, the other shoe dropped.
My wife and I quickly realized that this was working much better, but it still wasn't quite right. People my wife emailed on a daily basis weren't receiving her messages. The common denominator was that all of these people were using Yahoo e-mail accounts. So, I manually forced delivery of one e-mail messages and saw that Yahoo was deferring delivery due to questionable traffic patterns. And, that made sense; I was trying to deliver 55 deferred messages, probably all at once.
It's important to note that I monitor my e-mail server, and the Exim daemon never sent an alarm, so merely monitoring a service isn't enough. Instead of monitoring the service itself, it's better to monitor the server's function, which is what the rest of this article is about.
I was hesitant to write another article on Nagios, but e-mail is becoming more and more critical, and when it does break, it breaks in strange ways.
Of course, I monitor my Exim daemon as well as my server's route to the Internet. I use a Nagios service check for SMTP, like this:
define service {
use generic-service
name smtp
host_name host.example.com
notification_options w,c,r
service_description E-Mail SMTP Server
check_command check_smtp
}
I use a similar check to monitor my Internet gateway. But, as bad as the e-mail situation became, neither of these alarms would have indicated a problem. So, rather than monitoring to see whether a process is running, I set out to begin monitoring the server's critical functions, e-mail transport and delivery.
The first problem I wanted to address was being informed when messages were stuck in Exim's mail queue. I actually thought I'd have to write a custom program to check for this situation. While researching the situation further, I came across a posting from someone with a similar problem. It turns out that Nagios already has a command that performs this check, and I never knew it. Nagios's check commands are in /usr/nagios/libexec/, and let me tell you, there is a lot of gold in that directory.
So, I created an entry in Nagios's checkcommands.cfg file, like this:
define command{
command_name check_mailq
command_line $USER1$/check_mailq -w 3 -c 5 -v 9
}
Then, I created an entry in the services.cfg file that looked like this:
define service {
use generic-service
name mailq
host_name dominion
notification_options w,c,r
service_description SMTP Mail Queue
check_command check_mailq
}
Finally, I restarted Nagios and tested the new configuration by shutting down my server's outside network interface and attempting to send an e-mail message. Obviously, the mail transport operation failed and I got my alarm.
So at this point, I am pretty sure that if I have another problem with my e-mail system, at least I'll know it in a timely fashion. But, I thought it would be good to put in one more check.
It would be nice to know if my server ever finds itself on a Real-time Blocking List (RBL). Once again, Nagios has a command to check for this situation, but it comes in C source, which I couldn't get to compile. Anyway, I think I like my solution better.
My program looks up the server's IP address at http://www.anti-abuse.org, which, in turn, checks the IP address against several other RBLs at once. I'm probably going to configure Nagios to perform this check a few times a day, at most.
Here's the program:
#!/usr/bin/perl open CMD, "wget -q http://www.anti-abuse.org/rblresults.php?host=192.168.1.1 -O - |"; while () { if (!/listed in /) { next; } if (!/NOT listed in /) { $error++; } } if (!$error) { print "OK\n"; exit 0; } else { print "CRITICAL: $error\n"; }
As you can see, it's not that complex. It simply sends a query to Anti-abuse.org and looks for the results. I hard-coded my machine's IP address in this case, but it would be trivial to use one of Nagios' variables and send the IP address as a command-line parameter to this program. Then, the program makes sure that each of the results indicates that my machine is not listed on an RBL. If this check fails, we set a flag for later use. Finally, I created a checkcommand.cfg and services.cfg entry just as I did above.
Now I find myself in the awkward predicament of having written a program that I can't test. In order to test this program fully, I'd have to get my server on an RBL list, which I'm not about to do. Even so, I believe this program will work.
I don't know about you, but I live by e-mail, so my e-mail system simply has to work. The problems I had recently demonstrated that my monitoring policy wasn't sufficient. I believe that the new policy would have alerted me to the situation in a timely fashion. But, as is always the case, you can't test for everything, so I'm sure I'm missing something.
Mike Diehl is a freelance Computer Nerd specializing in Linux administration, programing, and VoIP. Mike lives in Albuquerque, NM. with his wife and 3 sons. He can be reached at mdiehl@diehlnet.com
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Reply to comment | Linux Journal
1 hour 51 min ago - Reply to comment | Linux Journal
8 hours 45 min ago - Reply to comment | Linux Journal
9 hours 2 min ago - Favorite (and easily brute-forced) pw's
10 hours 53 min ago - Have you tried Boxen? It's a
16 hours 45 min ago - seo services in india
21 hours 16 min ago - For KDE install kio-mtp
21 hours 17 min ago - Evernote is much more...
23 hours 17 min ago - Reply to comment | Linux Journal
1 day 8 hours ago - Dynamic DNS
1 day 8 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?



Comments
Blocked wget
Here is what I had to change to get this to work. It seems anti-abuse is blocking wget so I had to use the --user-agent option. Also I only check for is listed in and return exit code 2 for Nagios critical. Replace x.x.x.x with known good and bad ip for testing.
*duh*
Your IP will never be blocked because you're checking a non-routable address.... Read RFC1918.
Mail loop test
Checking the daemon running and queue size is nice, but there's more to a correctly running mail server. My Nagios is configured to check a "mail loop" - ie it sends an email with a timestamp to its mailserver and checks later on via IMAP/SSL that it arrived into a mailbox. Such a mail-loop checks the mailsystem in its complexity, including correctly configured DNS, working SMTP, local delivery (including LDAP in my case) and IMAP. Google for "mail loop nagios" for a load of scripts that can be used for this task. And yes, I get alerted via SMS if the loop breaks. Using clickatell.com's http interface for sms alerts.
Umm, this subject is duplicated in Monitoring SMTP damn
Damn this magazine has gone to shit; why don't you just post the Nagios Manual as an article? Damn, sad. Sad shit.
Very interesting
Why don't you just send a cc to yourself?
Very interesting, though.
Plugin sources & error in your script
First, a pointer:
http://nagiosplugins.org/
Second, there's a problem with your RBL check script. You don't set the exit code for the CRIT case so perl will exit with "0" in either case. Nagios actually uses the exit code, not the text, so you'll not get an alert.
http://nagios.sourceforge.net/docs/3_0/pluginapi.html
Also ... while()
Bah. HTML ate my code (which
Bah. HTML ate my code (which it probably did for the author as well:
while(<CMD>), not while()
but how?
My only concern is: How do you send the alert out if your email server is down? SMS? And if so, through what conduit?
SMS
You can send the alert via a SMS gateway. You can use Kannel software as a gateway and a Siemens M20 or Nokia Premicell devices as a SMSC.
Sure you can test it - pick
Sure you can test it - pick a banned ip and hard code that in!