EMU—Event Management Utility

The authors present a freely available tool for monitoring enterprise systems through simplicity toward complexity.
Problems with Enterprise Management Offerings

There are a few major players in the enterprise management arena, most notably Tivoli, BMC and Unicenter TNG. All of these strive to win support from other software and hardware vendors for their APIs. At the end of the day, they try hard to sell all the enterprise management components from their offerings. We find this disadvantageous to the end customer. Enterprise management is about using the best breed of products, from independent vendors, which can be easily integrated. This integration should be accomplished by either a single standard API or a command-line and script interface. Being UNIX types, weprefer a command-line interface and scripts. That way, we can intercept messages passed between enterprise management components. If a new product needs to be integrated, it can literally be plugged in.

EMU Background

EMU is a flexible and integration-ready event management tool developed under Linux. It consists of a manager and agents. In fact, it integrates monitoring and event manager into one program. Agents are very simple scripts invoked by cron. These scripts are run at regular intervals, perhaps every five minutes. Each run scans the resources it monitors, comparing their thresholds against a configuration file. If a threshold is exceeded, a message is sent to the manager.

EMU employs time-to-live, which proved to be a simple way of maintaining resource status across agent polls. Let us suppose an agent that runs every five minutes found a resource problem. At each poll, it will send an alarm message to the manager. In this case, time-to-live will be set to slightly more than five minutes at 330 seconds. The manager will maintain the first message sent and its updates. If no update is received within the 330 seconds time-to-live of a particular message, the message is deleted and the problem is assumed to have been fixed. This simple approach allows us to write simple agents, preferably scripts, that scan a monitored resource and send their findings on each poll. The manager takes care of the rest. In fact, thanks to EMU, agents consisting of a few lines of code can monitor a very complex resource.

ASCII and Tcl/Tk interfaces to EMU are available. They represent a console for viewing events. The console displays all the necessary information to keep IT staff up to date. Each event is uniquely represented by a resource ID, which consists of the monitored system host name and object ID. All updates from the same resource ID are treated as one displayable event, while all the individual updates are stored in an event log file. If no updates for a resource ID have been received within an agent-specific time-to-live, the event message is removed from the console.

Examples of resource IDs are dumbo.company.com.au:/usr/local, tcc2345:sendmail and brk23:tz45. The first field before the colon designates a host name; the second field is a unique resource name. Two resources on a single system must not have the same resource ID, because EMU would treat them as the same resource.

The input interface to the manager is emsg, a small utility that uses TCP sockets to send messages to the manager. While the manager is written in Perl, emsg is written in C to facilitate its easy deployment on any monitored platform. In fact, Jarra is currently contracting to a company to install emsg on Linux, Compaq Tru64 UNIX, Solaris, AIX, IRIX, Sinix, Ultrix and VMS.

The integration interface is taken to the extreme by invoking input, delete and output scripts. Depending on the type of message, these scripts are issued on receipt of the message, on its removal or on its processing. All the message attributes are passed to the script as environment variables. In this way, we have achieved integration with Unicenter TNG event management. The TNG console-held area is, in fact, an exact image of the EMU console, thus making it much more usable and efficient.

EMU was built with distributed processing in mind. Multiple managers can run on a single or several systems, thus forming a hierarchy reflecting a company's need. Through the truly open architecture of EMU, it is easy to synchronize multiple managers, build fail-over configurations or extend their functionality.

Installation and Configuration

EMU consists of a manager (gemu), cleaner (gemucleaner), emsg agent (emsg1) and console/browser (eb, xeb). The manager and cleaner must run on the same node. The cleaner process manages message expirations. In order to provide flexibility, only one option is passed to gemu, gemucleaner and eb—the port number the particular server is running on. Both gemucleaner and eb use emsg to send delete messages to EMU.

A configuration file used by gemu, gemucleaner and eb is stored in /usr/local/emu/conf/port#.cfg. The configuration file describes the location of the EMU database (DBM-based), location of log files, scan interval for gemucleaner, etc. Each server will access its own configuration file based on the port number. If it suits your site, put the database under /usr/local/emu/port#/db. Each port/server will have log files and action scripts stored under /usr/local/emu/port# in sub-directories named logs and actions. The binaries/scripts are shared and stored in /usr/local/emu/bin.

One option in the configuration file is the location of emsg. If emsg, compiled for the individual platforms on your site, is stored in the /usr/local/emu/EMSG directory, you are ready to run eb (EMU browser/console) locally on your workstation. This is accomplished by exporting as read-only the /usr/local/emu directory. This directory will be mounted on the workstation as /usr/local/emu. By creating a symbolic link /usr/local/bin/emsg that points to /usr/local/emu/EMSG/emsg.platform and putting /usr/local/emu/bin in your search path, eb will run locally while displaying event messages from the server.

Depending on what actions EMU is configured to handle, the user ID it is running under can be either “emu” or “root”.

For the input, delete and output scripts, message attributes (e.g., host name, message text) are passed as environment variables. These can easily be used to trigger actions. It is a good idea to have one launcher script that, depending on message attributes, calls other, task-specific scripts. As a result, the workload imposed on the manager system will be reduced. The output script can be used to selectively forward messages to either a higher-level EMU or a third-party system. The input script may be used as a barrier to stop certain messages from processing based on a calendar. If this script returns a value greater than 0, the message is discarded. The delete message can be used for synchronization with a third-party system.

Time to live can be specified as seconds, minutes, hours or a fixed time in the form of HH:MM. A time-to-live of -1 stands for infinity, and the associated message will be displayed in reverse video (by eb) and the cleaner will not expire it. The only way to put the message away is with a delete command on the console. This allows a batch job or backup failures to wait until they are acknowledged. Time-to-live set to 0 is used with so-called pass-through messages. They are not stored in the EMU database (they are recorded in the log file), but are intended to trigger an action.

Figure 1. ASCII-Based EMU Message Browser

The eb console provides a basic display of messages. A new message is displayed in bold to draw our attention. A message can be deleted/acknowledged or annotated. Message annotations appear indented under each message. They serve the purpose of notifying others about details, such as a work request that has been logged. The message time shown on the console was the local time on the system that sent the first message. It helps identify when a problem occurred.

Figure 2. GUI-Based EMU Message Browser

EMU maintains a separate log file for each day. This log file stores all received messages, including their attributes, e.g. host name, message text and class. Message attributes are delimited with a vertical bar to allow for easy processing in scripts or uploading spreadsheets.

______________________

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions