EMU—Event Management Utility

The authors present a freely available tool for monitoring enterprise systems through simplicity toward complexity.
Classes

The class option of emsg can be used in many powerful ways. When monitoring systems, it is best used for identifying a class hierarchy to the monitored resource; for example, /LINUX/PRO to designate a process subsystem or /LINUX/FS to designate a file system subsystem. In a way, it is similar to the SNMP OIDs; however, emsg class is much more flexible and can be created immediately as a need arises. Companies should develop a standard document detailing the classes format to be used. It is likely to reflect their business, resource and escalation hierarchies.

In a pure SNMP environment, a message arrives with an OID number that many people find cryptic and impractical. With the use of classes, the information is not only easy to read, but also lends itself to message filtering, forwarding, actioning, etc. For example, database messages may have a class set to /IT/ORACLE. On receipt of such a message, the DBA may be paged to attend to the problem.

Agents

In this section, a simple example agent for file system monitoring is demonstrated. Considerations are made of important aspects of the system along with tradeoffs. To make the example simple, the configuration file used will ignore minimal disk space limits for each file system. The code for this agent is shown in Listing 1.

Listing 1.

Once a resource is selected, we have to determine whether there is a periodicity in the way the resource can be monitored. For periodic monitoring, we need to know how frequently the resource should be monitored. The shorter the interval, the more resource-intensive the agent. However, by selecting too large an interval, we may miss an alarm in its early stage. For our file system agent, we will select a five-minute interval.

Next, time-to-live needs to be established. Given the poll interval of five minutes, we will select a time-to-live of six minutes. Remember, this must always be larger than the poll interval to keep display of events “continuous”. To achieve regular polling, the agent will be running from cron.

Once you have the agent, all that needs to be done is deciding which user it will run under and create a cron job for submissions in five-minute intervals. In fact, the simple code in Listing 1 is a full-blown agent for monitoring file systems with a 10% alarm limit.

Actions

Now, let us put together a simple output action script. We are going to use EMU for monitoring a flow of events. To accomplish this, a directory called events is created. This directory stores files with names reflecting event names. If a file exists, it means the event it describes is active. Once the file is removed, the event has finished. Consider a scenario where a backup of SAP_ORACLE must complete by 6 AM. If a backup event file is found after 6 AM, it indicates the backup is running overtime or the backup script crashed without an opportunity to remove the file.

The SAP_ORACLE backup script reads as follows:

#!/usr/bin/ksh
emsg -n emuserver -p 2345 -t 0 -s 3 -w
icecream\
-c ADD_EVENT -m SAP_ORACLE_BACKUP
# start backup
.
.
.
# backup finished
emsg -n emuserver -p 2345 -t 0 -s 3 -w
icecream\
-c DEL_EVENT -m SAP_ORACLE_BACKUP

The output action script that creates or removes the event file will look as follows:

if [ "$E_CLASS" = "ADD_EVENT" ];then
        touch /usr/local/emu/events/$E_MSG
fi
if [ "$E_CLASS" = "DEL_EVENT" ];then
        rm /usr/local/emu/events/$E_MSG
fi
Another example is an input action script that stops messages from a node called dumbo, even though the EMU password is correct. It is necessary to mention an environment variable called E_RHOST. In order to facilitate forwarding of messages from EMU to EMU, emsg has an -h option for changing the name of the host from which the message arrived. This message attribute is stored in E_HOST. However, E_RHOST stores the true node name from which the message arrived. The input action script is as follows:
if [ "$E_RHOST" = "dumbo" ];then
        exit 1
else
        exit 0
fi

Conclusion

Event management and resource monitoring is a complex subject, so we tried to touch on only the most important aspects of it. We believe by providing a free tool, enterprise event management will become a must on most sites. Linux is the best platform for EMU, since to take full advantage of its capabilities, an open and tools-rich environment is necessary. Check our web site at http://www.jarrix.com.au/ for the latest developments on the EMU front. Through collaboration around the globe, a valuable repository of EMU agents can be built. If you have an idea or have written an agent, let us know and we will post it on the EMU home page. If you have not done so yet, download EMU and delve into the vast and exciting horizons of enterprise management.

Resources

Jarra Voleynik has been involved with UNIX for the past 11 years. He is a graduate of the Technical University of Prague with an MS in Electronics. His first encounter with Linux two years ago got him hooked. He works as a UNIX consultant for Jarrix Systems. He can be reached at jarrix@yahoo.com.

Anna Voleynik (MS degree in Electronics) started being actively “aware” of Linux a year ago. She works as a UNIX Systems Administrator and keeps trying to minimize her and Jarra's “talking UNIX” at home. She mostly spends her spare time with their children, ages 8 and 2. She can be reached at anna_vol@yahoo.com.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix