SNMP Monitoring with Nagios
Because I did not have a spare Dell Power Edge server sitting around to test the modified script, I had to test it another way. Reading the man page for snmpd.conf, I found that you could have external programs answer certain OIDs using “pass-through” scripts. The bash script (dell_open_manager_test.sh) below serves as my pass-through script for testing. With this script, I can simulate all of the states that the Dell server could be in:
#!/bin/bash
#
# bash script to replicate a working Dell OpenManage SNMP agent
# works with Net-SNMP daemon. infotek@gmail.com
#
REQUEST_OID="$2"
echo "$REQUEST_OID";
case "$REQUEST_OID" in
.1.3.6.1.4.1.674.10892.1.200.10.1.4.1)
echo "integer"; echo "3"; exit 0 ;;
.1.3.6.1.4.1.674.10892.1.200.10.1.9.1)
echo "integer"; echo "5"; exit 0 ;;
.1.3.6.1.4.1.674.10892.1.200.10.1.12.1)
echo "integer"; echo "3"; exit 0 ;;
.1.3.6.1.4.1.674.10892.1.200.10.1.21.1)
echo "integer"; echo "4"; exit 0 ;;
.1.3.6.1.4.1.674.10892.1.200.10.1.24.1)
echo "integer"; echo "3"; exit 0 ;;
.1.3.6.1.4.1.674.10892.1.200.10.1.27.1)
echo "integer"; echo "3"; exit 0 ;;
.1.3.6.1.4.1.674.10892.1.200.10.1.30.1)
echo "integer"; echo "3"; exit 0 ;;
.1.3.6.1.4.1.674.10892.1.200.10.1.41.1)
echo "integer"; echo "3"; exit 0 ;;
.1.3.6.1.4.1.674.10893.1.20.110.13.0)
echo "integer"; echo "3"; exit 0 ;;
*)
echo "string"; echo "$@"; exit 0 ;;
esac
exit
To use the script, I added the following lines to the end of ./etc/snmp/snmpd.conf:
### dell open manager test
view systemview included .1.3.6.1.4.1.674
pass .1.3.6.1.4.1.674 /bin/bash \
/usr/local/bin/dell_open_manager_test.sh
To make the changes in the configuration file take effect, restart the snmpd dæmon. On Slackware, this is done via the following:
# /etc/rc.d/rc.snmpd restart
Shutting down snmpd: . DONE
Starting snmpd: /usr/sbin/snmpd -A -p \
/var/run/snmpd -a -c /etc/snmp/snmpd.conf
To query the SNMP server, we use Net-SNMP's command-line snmpget utility:
# snmpget -v 1 -c public 127.0.0.1 \
.1.3.6.1.4.1.674.10892.1.200.10.1.9.1
SNMPv2-SMI::enterprises.674.10892.1.200.10.1.9.1 = INTEGER: 3
The response is an integer value of 3. The value 3 in the DellStatus (see above) maps to “ok(3) The object's status is OK”. This tells us that the pass-through script is working. Now, we test the /check_dell_openmanager.pl Perl script:
# ./check_dell_openmanager.pl -H 127.0.0.1 -C public -T pe2950 OK
To test other values, simply modify the dell_open_manager_test.sh shell script. For example, to simulate an error in the Cooling Device OID (.1.3.6.1.4.1.674.10892.1.200.10.1.21), modify that OID's line in the script to return a code of 4 for nonCritical:
.1.3.6.1.4.1.674.10892.1.200.10.1.21.1) echo "integer"; echo "4"; exit 0 ;;
Now, running the Perl script produces a warning:
# ./check_dell_openmanager.pl -H 127.0.0.1 -C public -T pe2950 WARNING:Cooling Device Status=Non-Critical
To simulate a critical error, let's modify the Power Supply OID to reply with a 5:
.1.3.6.1.4.1.674.10892.1.200.10.1.9.1)
echo "integer"; echo "5"; exit 0 ;;
# ./check_dell_openmanager.pl -H 127.0.0.1 -C public -T pe2950
CRITICAL:Cooling Device Status=Non-Critical, \
Power Supply Status=Critical
To test the script on the live production systems, we added the check_dell_openmanager.pl command to a working Nagios server. We opened the case cover on a live system to generate a Chassis Intrusion Status error to test the plugin. Within a few seconds, we had an SMS message on the IT administrator's phone letting us know that there was a problem with the chassis subsystem on the server we just opened.
After writing this plugin, I uploaded it to a Web site that hosts third-party addons for Nagios named Nagios Exchange. In short order, I was getting e-mail messages from all over the world concerning the Nagios plugin I had written. Some were suggestions, and some were from people in need of help. It was not an overwhelming number of messages. At most, two a week and sometimes none. It was just enough to let me know that people other than me actually were using this thing.
I would like to make a few improvements to the module. For one, I think there may be a way to reduce the SNMP queries to only one query to obtain the overall global status of the machine. Then, only if the state is not “ok(3)”, move to query the other OIDs so that a more specific error can be reported.
It also would be nice to be able to evaluate the existence of the various subsystems, that way, for example, if a machine has a RAID array, it is monitored, and if not, the script skips it.
One of the most common e-mail messages I get is about missing the Net::SNMP Perl module. I would like to test for these common-case scenarios. If the test fails, I would like to print the problem with a solution. In the case of “Net::SNMP”, it should print:
You are missing the Net::SNMP perl module. Please install it using: perl -MCPAN -e shell cpan> install "Net::SNMP"
This would improve end-user experience significantly, especially for users new to Linux.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- What's the tweeting protocol?
- New Products
- Readers' Choice Awards
- RSS Feeds
- Dart: a New Web Programming Experience
- Reply to comment | Linux Journal
11 hours 41 min ago - Reply to comment | Linux Journal
14 hours 14 min ago - Reply to comment | Linux Journal
15 hours 31 min ago - great post
16 hours 6 min ago - Google Docs
16 hours 28 min ago - Reply to comment | Linux Journal
21 hours 17 min ago - Reply to comment | Linux Journal
22 hours 3 min ago - Web Hosting IQ
23 hours 37 min ago - Thanks for taking the time to
1 day 1 hour ago - Linux is good
1 day 3 hours ago




Comments
why reinvent the wheel :-)
http://folk.uio.no/trondham/software/check_openmanage.html
check_openmanage does all you need and is (at this point) a better solution for nagios.
Wheel wasn't reinvented
I feel the need to comment on this. I am the author of the check_openmanage plugin. I know for a fact that Jason's plugin existed long before check_openmanage, so to be precise it was I who reinvented the wheel. Also, our two plugins are different in their focus, and I believe that both are needed. Users can choose whichever plugin they want, among these two and many others. Isn't open source great :)
Besides that, I really enjoyed Jason's article. It explains in a detailed and concise manner how one goes about to monitor something with SNMP, and how to integrate this with Nagios. This is universally useful to many out there.
Thanks for a great article, Jason!
-trond
MonitorSNMP is a free
MonitorSNMP is a free monitoring service, basic but provides notification based on rules. Easy to setup and use. Take a look