Puppet and Nagios: a Roadmap to Advanced Configuration

Puppet has provided baked-in Nagios support for a long time now. When combined with Exported Resources, Puppet is well suited to manage an intelligent Nagios configuration where nodes are automatically inventoried and monitored. The excellent Pro Puppet, written by James Turnbull, provides a fairly complete rundown of the installation and configuration steps needed in order to progress in this direction, so I won't repeat the information here. Instead, this article highlights some less-than-optimal default behavior of the Nagios types and details my solution that results in a cleaner filesystem and improved performance.

Not All Resources Should Be Exported!

This took me an embarrassingly long time to figure out. Just like resources that are defined in a manifest, Exported Resources must be unique. For example, suppose we have nodes foo and bar, which we'd like to categorize into a Nagios hostgroup named "PMF". At first glance, adding the following code to foo's manifest might seem like the way to go:


@@nagios_hostgroup { "PMF":
  ensure => present,
  hostgroup_members +> [ $::hostname ]
}

In theory, the resource will be exported to the database when the first node compiles its manifest, but the next node's compilation will complain with a duplicate resource error. For this reason, we will avoid exporting resources created by this particular type. Instead, we will manage our hostgroup memberships via the hostgroup parameter of the nagios_host type.

Had it not been for Pieter Barrezeele's blog (http://pieter.barrezeele.be/2009/05/11/puppet-and-nagios), I may have ended up settling for Puppet's fairly inefficient approach to storing resources managed via its Nagios types. By default, these bits are maintained in hard-coded file paths according to type used. For example, all resources based on the nagios_service type are collected and stored in /etc/nagios/nagios_service.cfg and so on. For performance reasons, I want each collected resource to be stored in its own file path based on the following naming convention:


<base_directory>/<type>_<h3>_<hostname>.cfg

Furthermore, I want my filenames to be composed of all lowercase letters and spaces replaced with underscores. For starters, let's add the bare minimum snippets of code into our manifests in order to export and collect resources using the nagios_host type (Listings 1 and 2).

Listing 1. modules/nagios/manifests/init.pp


# This class will be used by the nagios server
class nagios {

  service { nagios:
    ensure => running,
    enable => true,
  }

  # Be sure to include this directory in your nagios.cfg 
  # with the cfg_dir directive

  file { resource-d:
    path => '/etc/nagios/resource.d',
    ensure => directory,
    owner => 'nagios',
  }

  # Collect the nagios_host resources
  Nagios_host <<||>> {
    require => File[resource-d],
    notify => Service[nagios],
  }
}

Listing 2. /modules/nagios/manifests/export.pp


# All agents (including the nagios server) will use this
class nagios::export {

  @@nagios_host { $::hostname:
    address => $::ipaddress,
    check_command => 'check_host_alive!3000.0,80%!5000.0,100%!10',
    target => "/etc/nagios/resource.d/host_${::hostname}.cfg",
  }
}

Note:

Due to the inherent space limitations of published articles, all code will be kept as minimal as possible while conforming to the structure of Puppet Modules. However, no attempt will be made to reproduce a complete module capable of managing a Nagios instance. Instead, I focus on the concepts that have been defined in this article's introduction. Please see http://docs.puppetlabs.com if you need an introduction to Puppet modules.

Let's examine the good and the not-so-good aspects of what we've defined up to this point. On the positive side, all agents will export a nagios_host resource. The Nagios server, upon compiling its manifest, will collect each resource, store it in a unique file, and refresh the Nagios service. At first glance, it may seem like our work is done. Unfortunately, our solution is littered with the following issues and shortcomings:

  1. Nagios will not be able to read the newly created .cfg files since the Puppet Agent will create them while running as the root user.

  2. There is too much "coordination" needed with the target parameter of the nagios_host type. We should not have to work so hard in order to ensure our target points to the correct file and is void of unpleasant things like spaces and/or mixed case.

  3. The address parameter is hard-coded with the value of the ipaddress fact. Although this may be acceptable in some environments, we really should allow for greater flexibility.

  4. No ability exists to leverage Nagios hostgroups.

  5. Puppet will be unable to purge our exported resources, because we are not using the default behavior of the target parameter.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re:

alexnew's picture

purge logic is cumbersome... set recurse, purge, force on resource dir, and expire your nodes in puppet properly (puppet node clean/deactivate foo) and you'll achieve the same ACHO

file ownership is not a problem

Anonymous's picture

"Before we begin, let's make sure we understand the most important problem—the issue of file ownership and permissions for the newly generated .cfg files. Because these files are created via the target parameter of each associated Nagios type, they'll be written to disk by the user Puppet runs as. This means they will be owned by the root user/group, and Nagios will not have permission to read them (because I know you are not running Nagios as root, correct?)."

I don't get this.

On a default Ubuntu 12.04 machine, Nagios3 runs as user nagios. All .cfg files are are owned by root. Nagios is just fine with that. Doesn't even complain in the logs about it.

Even if it weren't, couldn't I just chown nagios:nagios /etc/nagios3/conf.d and chmod g+s /etc/nagios3/conf.d? This would ensure all newly created files in /etc/nagios3/conf.d/ were owned by the nagios group, of which user nagios is a member.

I don't understand how the filepermissions are the 'most important problem' in this.

Turns out it's not so much

Anonymous's picture

Turns out it's not so much the ownership that is the problem, but permissions. Puppet creates the files with a 0600 mode, making it unreadable for Nagios.

purge logic is cumbersome...

asq's picture

purge logic is cumbersome... set recurse, purge, force on resource dir, and expire your nodes in puppet properly (puppet node clean/deactivate foo) and you'll achieve the same.

Reply to comment | Linux Journal

How To Be On TV Shows's picture

I am no longer positive the place you are getting your information, but great topic.
I must spend some time finding out more or figuring out
more. Thanks for great info I used to be in search of this information for
my mission.

Reply to comment | Linux Journal

Juwelier's picture

Thаnks fοr fіnally wrіtіng about >
Reply to comment | Linux Journal < Liked it!

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix