Puppet and Nagios: a Roadmap to Advanced Configuration

Expire, Collect and Purge Exported Resources

Up until this point, the job of our Nagios server simply has been to collect exported resources. In the real world, the nodes it monitors are retired for one reason or another quite routinely. When a node is retired, I want to be sure the relevant Nagios objects are removed and the corresponding database records are deleted. According to Puppet's documentation, these resources can be purged from the collector only when default target locations are leveraged (http://docs.puppetlabs.com/references/stable/type.html#nagioshost). Even so, I wasn't happy to see orphaned database records left behind and decided to address this issue with a few Puppet functions and some basic class ordering. Before we dive in, some work flow and terminology must be understood:

  • Expire: a Nagios resource is "expired" by setting the value of its "ensure" parameter to "absent".

  • Collect: the resource is removed from the collector due to the value of its "ensure" parameter.

  • Purge: all database records associated with the expired host are deleted.

Ordering is obviously a big deal here. In order to ensure proper execution of each task, we will break out each unit of work into its own class and use a mix of "include" and "require" functions. Using Puppet terminology, we now can express this "expire, collect, then purge" work flow as follows:

  • The nagios class requires the nagios::expire_resources class.

  • The nagios class includes the nagios::purge_resources class.

  • The nagios::purge_resources class requires the nagios::collect_resources class.

Now, let's look at a few custom functions, expire_exported and purge_exported. These functions (written for PostgreSQL) perform the database operations that are required in order to expire hosts and their resources. They both operate on a node-scoped variable named $my_nagios_purge_hosts, which should contain an array of hostnames. If used, this variable should be placed somewhere in your Nagios server's node definition. For example:


node corona {
  $my_nagios_purge_hosts = [ 'foo', 'bar', 'baz' ]
  include nagios
}

With this node-scoped variable defined, your (affectionately named) Nagios server will reconfigure itself after dropping all resources for the three hosts mentioned above (Listings 9 and 10).

Listing 9. nagios/lib/puppet/parser/functions/expire_exported.rb


Puppet::Parser::Functions::newfunction(
  :expire_exported,
  :doc => "Sets a host's resources to ensure => 
↪absent as part of a purge work-flow.") do |args|

  require 'rubygems'
  require 'pg'
  require 'puppet'

  raise Puppet::ParseError, "Missing hostname." if args.empty?
  hosts = args.flatten

  begin
    conn = PGconn.open(:dbname => 'puppet', :user => 'postgres')

    hosts.each do |host|
      Puppet.notice("Expiring resources for host: #{host}")
      conn.exec("SELECT id FROM hosts WHERE name = 
↪\'#{host}\'") do |host_id|
        raise "Too many hosts" if host_id.ntuples > 1
        conn.exec("SELECT id FROM param_names WHERE name = 
↪'ensure'") do |param_id|

          conn.exec("SELECT id FROM resources WHERE host_id =
↪#{host_id.values.flatten[0].to_i}") do |results|

            resource_ids = []
            results.each do |row|
              resource_ids << Hash[*row.to_a.flatten]
            end

            resource_ids.each do |resource|
              conn.exec("UPDATE param_values SET VALUE = 
↪'absent' WHERE resource_id = #{resource['id']} AND 
↪param_name_id = #{param_id.values}")
            end
          end
        end
      end
    end
  rescue => e
    Puppet.notice(e.message)
  ensure
    conn.close
  end
end

Listing 10. nagios/lib/puppet/parser/functions/purge_exported.rb


# This function will be used by the exported 
# resources collector (the nagios box)
Puppet::Parser::Functions::newfunction(:purge_exported, 
↪:doc => "delete expired resources.") do |args|

  require 'rubygems'
  require 'pg'
  require 'puppet'

  raise Puppet::ParseError, "Missing hostname." if args.empty?
  hosts = args.flatten

  begin
    conn = PGconn.open(:dbname => 'puppet', :user => 'postgres')

    hosts.each do |host|

      Puppet.notice("Purging expired resources for host: #{host}")
      conn.exec("SELECT id FROM hosts WHERE name = 
↪\'#{host}\'") do |host_id|

        raise "Too many hosts" if host_id.ntuples > 1
        conn.exec("SELECT id FROM resources WHERE host_id =
↪#{host_id.values.flatten[0].to_i}") do |results|

          resource_ids = []
          results.each do |row|
            resource_ids << Hash[*row.to_a.flatten]
          end

          resource_ids.each do |resource|
            conn.exec("DELETE FROM param_values WHERE 
↪resource_id = #{resource['id']}")
            conn.exec("DELETE FROM resources WHERE id = 
↪#{resource['id']}")
          end
        end

        conn.exec("DELETE FROM hosts WHERE id = 
↪#{host_id.values}")
      end
    end
  rescue => e
    Puppet.notice(e.message)
  ensure
    conn.close
  end
end

And, now for the refactored nagios class and related code (Listings 11–14).

Listing 11. modules/nagios/manifests/init.pp


# This class will be used by the nagios server
class nagios {

  include nagios::params
  require nagios::expire_resources
  include nagios::purge_resources

  service { $nagios::params::service:
    ensure => running,
    enable => true,
  }

  # nagios.cfg needs this specified via the cfg_dir directive
  file { $nagios::params::resource_dir:
    ensure => directory,
    owner => $nagios::params::user,
  }

  # Local Nagios resources
  nagios::resource { [ 'Nagios Servers', 'Puppet Servers', 'Other' ]:
    type => hostgroup,
    export => false;
  }
}

Listing 12. modules/nagios/manifests/expire_resources.pp


class nagios::expire_resources {

  if $my_nagios_purge_hosts {
    expire_exported($my_nagios_purge_hosts)
  }
}

Listing 13. modules/nagios/manifests/purge_resources.pp


class nagios::purge_resources {

  require nagios::collect_resources

  if $my_nagios_purge_hosts {
    purge_exported($my_nagios_purge_hosts)
  }
}

Listing 14. modules/nagios/manifests/collect_resources.pp


class nagios::collect_resources {

  include nagios::params

  Nagios_host <<||>> {
    require => $nagios::params::resource_dir,
    notify => Service[$nagios::params::service],
  }

  File <<| tag == nagios_host |>> {
    notify => Service[$nagios::params::service],
  }
}

The basic building blocks are now in place. Extend nagios::resources, plug the classes in to your nagios module and kick back. If a node goes MIA and needs to be purged, toss it into your $my_nagios_purge_hosts array and be done with it. Until next time, may your Nagios dashboards be green and your alerts be few.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re:

alexnew's picture

purge logic is cumbersome... set recurse, purge, force on resource dir, and expire your nodes in puppet properly (puppet node clean/deactivate foo) and you'll achieve the same ACHO

file ownership is not a problem

Anonymous's picture

"Before we begin, let's make sure we understand the most important problem—the issue of file ownership and permissions for the newly generated .cfg files. Because these files are created via the target parameter of each associated Nagios type, they'll be written to disk by the user Puppet runs as. This means they will be owned by the root user/group, and Nagios will not have permission to read them (because I know you are not running Nagios as root, correct?)."

I don't get this.

On a default Ubuntu 12.04 machine, Nagios3 runs as user nagios. All .cfg files are are owned by root. Nagios is just fine with that. Doesn't even complain in the logs about it.

Even if it weren't, couldn't I just chown nagios:nagios /etc/nagios3/conf.d and chmod g+s /etc/nagios3/conf.d? This would ensure all newly created files in /etc/nagios3/conf.d/ were owned by the nagios group, of which user nagios is a member.

I don't understand how the filepermissions are the 'most important problem' in this.

Turns out it's not so much

Anonymous's picture

Turns out it's not so much the ownership that is the problem, but permissions. Puppet creates the files with a 0600 mode, making it unreadable for Nagios.

purge logic is cumbersome...

asq's picture

purge logic is cumbersome... set recurse, purge, force on resource dir, and expire your nodes in puppet properly (puppet node clean/deactivate foo) and you'll achieve the same.

Reply to comment | Linux Journal

How To Be On TV Shows's picture

I am no longer positive the place you are getting your information, but great topic.
I must spend some time finding out more or figuring out
more. Thanks for great info I used to be in search of this information for
my mission.

Reply to comment | Linux Journal

Juwelier's picture

Thаnks fοr fіnally wrіtіng about >
Reply to comment | Linux Journal < Liked it!

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState