SNMP Monitoring with Nagios
Because I did not have a spare Dell Power Edge server sitting around to test the modified script, I had to test it another way. Reading the man page for snmpd.conf, I found that you could have external programs answer certain OIDs using “pass-through” scripts. The bash script (dell_open_manager_test.sh) below serves as my pass-through script for testing. With this script, I can simulate all of the states that the Dell server could be in:
#!/bin/bash # # bash script to replicate a working Dell OpenManage SNMP agent # works with Net-SNMP daemon. email@example.com # REQUEST_OID="$2" echo "$REQUEST_OID"; case "$REQUEST_OID" in .22.214.171.124.4.1.674.108126.96.36.199.1.4.1) echo "integer"; echo "3"; exit 0 ;; .188.8.131.52.4.1.674.108184.108.40.206.1.9.1) echo "integer"; echo "5"; exit 0 ;; .220.127.116.11.4.1.674.10818.104.22.168.1.12.1) echo "integer"; echo "3"; exit 0 ;; .22.214.171.124.4.1.674.108126.96.36.199.1.21.1) echo "integer"; echo "4"; exit 0 ;; .188.8.131.52.4.1.674.108184.108.40.206.1.24.1) echo "integer"; echo "3"; exit 0 ;; .220.127.116.11.4.1.674.10818.104.22.168.1.27.1) echo "integer"; echo "3"; exit 0 ;; .22.214.171.124.4.1.674.108126.96.36.199.1.30.1) echo "integer"; echo "3"; exit 0 ;; .188.8.131.52.4.1.674.108184.108.40.206.1.41.1) echo "integer"; echo "3"; exit 0 ;; .220.127.116.11.4.1.674.10818.104.22.168.13.0) echo "integer"; echo "3"; exit 0 ;; *) echo "string"; echo "$@"; exit 0 ;; esac exit
To use the script, I added the following lines to the end of ./etc/snmp/snmpd.conf:
### dell open manager test view systemview included .22.214.171.124.4.1.674 pass .126.96.36.199.4.1.674 /bin/bash \ /usr/local/bin/dell_open_manager_test.sh
To make the changes in the configuration file take effect, restart the snmpd dæmon. On Slackware, this is done via the following:
# /etc/rc.d/rc.snmpd restart Shutting down snmpd: . DONE Starting snmpd: /usr/sbin/snmpd -A -p \ /var/run/snmpd -a -c /etc/snmp/snmpd.conf
To query the SNMP server, we use Net-SNMP's command-line snmpget utility:
# snmpget -v 1 -c public 127.0.0.1 \ .188.8.131.52.4.1.674.108184.108.40.206.1.9.1 SNMPv2-SMI::enterprises.674.108220.127.116.11.1.9.1 = INTEGER: 3
The response is an integer value of 3. The value 3 in the DellStatus (see above) maps to “ok(3) The object's status is OK”. This tells us that the pass-through script is working. Now, we test the /check_dell_openmanager.pl Perl script:
# ./check_dell_openmanager.pl -H 127.0.0.1 -C public -T pe2950 OK
To test other values, simply modify the dell_open_manager_test.sh shell script. For example, to simulate an error in the Cooling Device OID (.18.104.22.168.4.1.674.10822.214.171.124.1.21), modify that OID's line in the script to return a code of 4 for nonCritical:
.126.96.36.199.4.1.674.108188.8.131.52.1.21.1) echo "integer"; echo "4"; exit 0 ;;
Now, running the Perl script produces a warning:
# ./check_dell_openmanager.pl -H 127.0.0.1 -C public -T pe2950 WARNING:Cooling Device Status=Non-Critical
To simulate a critical error, let's modify the Power Supply OID to reply with a 5:
.184.108.40.206.4.1.674.108220.127.116.11.1.9.1) echo "integer"; echo "5"; exit 0 ;; # ./check_dell_openmanager.pl -H 127.0.0.1 -C public -T pe2950 CRITICAL:Cooling Device Status=Non-Critical, \ Power Supply Status=Critical
To test the script on the live production systems, we added the check_dell_openmanager.pl command to a working Nagios server. We opened the case cover on a live system to generate a Chassis Intrusion Status error to test the plugin. Within a few seconds, we had an SMS message on the IT administrator's phone letting us know that there was a problem with the chassis subsystem on the server we just opened.
After writing this plugin, I uploaded it to a Web site that hosts third-party addons for Nagios named Nagios Exchange. In short order, I was getting e-mail messages from all over the world concerning the Nagios plugin I had written. Some were suggestions, and some were from people in need of help. It was not an overwhelming number of messages. At most, two a week and sometimes none. It was just enough to let me know that people other than me actually were using this thing.
I would like to make a few improvements to the module. For one, I think there may be a way to reduce the SNMP queries to only one query to obtain the overall global status of the machine. Then, only if the state is not “ok(3)”, move to query the other OIDs so that a more specific error can be reported.
It also would be nice to be able to evaluate the existence of the various subsystems, that way, for example, if a machine has a RAID array, it is monitored, and if not, the script skips it.
One of the most common e-mail messages I get is about missing the Net::SNMP Perl module. I would like to test for these common-case scenarios. If the test fails, I would like to print the problem with a solution. In the case of “Net::SNMP”, it should print:
You are missing the Net::SNMP perl module. Please install it using: perl -MCPAN -e shell cpan> install "Net::SNMP"
This would improve end-user experience significantly, especially for users new to Linux.
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
|Speed Up Your Web Site with Varnish||Jun 19, 2013|
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- Non-Linux FOSS: libnotify, OS X Style
- UX Designer
- RSS Feeds
25 min 26 sec ago
- good point!
28 min 17 sec ago
- Varnish works!
37 min 24 sec ago
- Reply to comment | Linux Journal
1 hour 7 min ago
- Reply to comment | Linux Journal
3 hours 33 min ago
- Reply to comment | Linux Journal
7 hours 32 min ago
- Yeah, user namespaces are
8 hours 49 min ago
- Cari Uang
12 hours 20 min ago
- user namespaces
15 hours 13 min ago
15 hours 39 min ago