EMU—Event Management Utility

The authors present a freely available tool for monitoring enterprise systems through simplicity toward complexity.
Problems with Enterprise Management Offerings

There are a few major players in the enterprise management arena, most notably Tivoli, BMC and Unicenter TNG. All of these strive to win support from other software and hardware vendors for their APIs. At the end of the day, they try hard to sell all the enterprise management components from their offerings. We find this disadvantageous to the end customer. Enterprise management is about using the best breed of products, from independent vendors, which can be easily integrated. This integration should be accomplished by either a single standard API or a command-line and script interface. Being UNIX types, weprefer a command-line interface and scripts. That way, we can intercept messages passed between enterprise management components. If a new product needs to be integrated, it can literally be plugged in.

EMU Background

EMU is a flexible and integration-ready event management tool developed under Linux. It consists of a manager and agents. In fact, it integrates monitoring and event manager into one program. Agents are very simple scripts invoked by cron. These scripts are run at regular intervals, perhaps every five minutes. Each run scans the resources it monitors, comparing their thresholds against a configuration file. If a threshold is exceeded, a message is sent to the manager.

EMU employs time-to-live, which proved to be a simple way of maintaining resource status across agent polls. Let us suppose an agent that runs every five minutes found a resource problem. At each poll, it will send an alarm message to the manager. In this case, time-to-live will be set to slightly more than five minutes at 330 seconds. The manager will maintain the first message sent and its updates. If no update is received within the 330 seconds time-to-live of a particular message, the message is deleted and the problem is assumed to have been fixed. This simple approach allows us to write simple agents, preferably scripts, that scan a monitored resource and send their findings on each poll. The manager takes care of the rest. In fact, thanks to EMU, agents consisting of a few lines of code can monitor a very complex resource.

ASCII and Tcl/Tk interfaces to EMU are available. They represent a console for viewing events. The console displays all the necessary information to keep IT staff up to date. Each event is uniquely represented by a resource ID, which consists of the monitored system host name and object ID. All updates from the same resource ID are treated as one displayable event, while all the individual updates are stored in an event log file. If no updates for a resource ID have been received within an agent-specific time-to-live, the event message is removed from the console.

Examples of resource IDs are dumbo.company.com.au:/usr/local, tcc2345:sendmail and brk23:tz45. The first field before the colon designates a host name; the second field is a unique resource name. Two resources on a single system must not have the same resource ID, because EMU would treat them as the same resource.

The input interface to the manager is emsg, a small utility that uses TCP sockets to send messages to the manager. While the manager is written in Perl, emsg is written in C to facilitate its easy deployment on any monitored platform. In fact, Jarra is currently contracting to a company to install emsg on Linux, Compaq Tru64 UNIX, Solaris, AIX, IRIX, Sinix, Ultrix and VMS.

The integration interface is taken to the extreme by invoking input, delete and output scripts. Depending on the type of message, these scripts are issued on receipt of the message, on its removal or on its processing. All the message attributes are passed to the script as environment variables. In this way, we have achieved integration with Unicenter TNG event management. The TNG console-held area is, in fact, an exact image of the EMU console, thus making it much more usable and efficient.

EMU was built with distributed processing in mind. Multiple managers can run on a single or several systems, thus forming a hierarchy reflecting a company's need. Through the truly open architecture of EMU, it is easy to synchronize multiple managers, build fail-over configurations or extend their functionality.

Installation and Configuration

EMU consists of a manager (gemu), cleaner (gemucleaner), emsg agent (emsg1) and console/browser (eb, xeb). The manager and cleaner must run on the same node. The cleaner process manages message expirations. In order to provide flexibility, only one option is passed to gemu, gemucleaner and eb—the port number the particular server is running on. Both gemucleaner and eb use emsg to send delete messages to EMU.

A configuration file used by gemu, gemucleaner and eb is stored in /usr/local/emu/conf/port#.cfg. The configuration file describes the location of the EMU database (DBM-based), location of log files, scan interval for gemucleaner, etc. Each server will access its own configuration file based on the port number. If it suits your site, put the database under /usr/local/emu/port#/db. Each port/server will have log files and action scripts stored under /usr/local/emu/port# in sub-directories named logs and actions. The binaries/scripts are shared and stored in /usr/local/emu/bin.

One option in the configuration file is the location of emsg. If emsg, compiled for the individual platforms on your site, is stored in the /usr/local/emu/EMSG directory, you are ready to run eb (EMU browser/console) locally on your workstation. This is accomplished by exporting as read-only the /usr/local/emu directory. This directory will be mounted on the workstation as /usr/local/emu. By creating a symbolic link /usr/local/bin/emsg that points to /usr/local/emu/EMSG/emsg.platform and putting /usr/local/emu/bin in your search path, eb will run locally while displaying event messages from the server.

Depending on what actions EMU is configured to handle, the user ID it is running under can be either “emu” or “root”.

For the input, delete and output scripts, message attributes (e.g., host name, message text) are passed as environment variables. These can easily be used to trigger actions. It is a good idea to have one launcher script that, depending on message attributes, calls other, task-specific scripts. As a result, the workload imposed on the manager system will be reduced. The output script can be used to selectively forward messages to either a higher-level EMU or a third-party system. The input script may be used as a barrier to stop certain messages from processing based on a calendar. If this script returns a value greater than 0, the message is discarded. The delete message can be used for synchronization with a third-party system.

Time to live can be specified as seconds, minutes, hours or a fixed time in the form of HH:MM. A time-to-live of -1 stands for infinity, and the associated message will be displayed in reverse video (by eb) and the cleaner will not expire it. The only way to put the message away is with a delete command on the console. This allows a batch job or backup failures to wait until they are acknowledged. Time-to-live set to 0 is used with so-called pass-through messages. They are not stored in the EMU database (they are recorded in the log file), but are intended to trigger an action.

Figure 1. ASCII-Based EMU Message Browser

The eb console provides a basic display of messages. A new message is displayed in bold to draw our attention. A message can be deleted/acknowledged or annotated. Message annotations appear indented under each message. They serve the purpose of notifying others about details, such as a work request that has been logged. The message time shown on the console was the local time on the system that sent the first message. It helps identify when a problem occurred.

Figure 2. GUI-Based EMU Message Browser

EMU maintains a separate log file for each day. This log file stores all received messages, including their attributes, e.g. host name, message text and class. Message attributes are delimited with a vertical bar to allow for easy processing in scripts or uploading spreadsheets.