Puppet and Nagios: a Roadmap to Advanced Configuration
Expire, Collect and Purge Exported Resources
Up until this point, the job of our Nagios server simply has been to collect exported resources. In the real world, the nodes it monitors are retired for one reason or another quite routinely. When a node is retired, I want to be sure the relevant Nagios objects are removed and the corresponding database records are deleted. According to Puppet's documentation, these resources can be purged from the collector only when default target locations are leveraged (http://docs.puppetlabs.com/references/stable/type.html#nagioshost). Even so, I wasn't happy to see orphaned database records left behind and decided to address this issue with a few Puppet functions and some basic class ordering. Before we dive in, some work flow and terminology must be understood:
-
Expire: a Nagios resource is "expired" by setting the value of its "ensure" parameter to "absent".
-
Collect: the resource is removed from the collector due to the value of its "ensure" parameter.
-
Purge: all database records associated with the expired host are deleted.
Ordering is obviously a big deal here. In order to ensure proper execution of each task, we will break out each unit of work into its own class and use a mix of "include" and "require" functions. Using Puppet terminology, we now can express this "expire, collect, then purge" work flow as follows:
-
The nagios class requires the nagios::expire_resources class.
-
The nagios class includes the nagios::purge_resources class.
-
The nagios::purge_resources class requires the nagios::collect_resources class.
Now, let's look at a few custom functions, expire_exported and purge_exported. These functions (written for PostgreSQL) perform the database operations that are required in order to expire hosts and their resources. They both operate on a node-scoped variable named $my_nagios_purge_hosts, which should contain an array of hostnames. If used, this variable should be placed somewhere in your Nagios server's node definition. For example:
node corona {
$my_nagios_purge_hosts = [ 'foo', 'bar', 'baz' ]
include nagios
}
With this node-scoped variable defined, your (affectionately named) Nagios server will reconfigure itself after dropping all resources for the three hosts mentioned above (Listings 9 and 10).
Listing 9. nagios/lib/puppet/parser/functions/expire_exported.rb
Puppet::Parser::Functions::newfunction(
:expire_exported,
:doc => "Sets a host's resources to ensure =>
↪absent as part of a purge work-flow.") do |args|
require 'rubygems'
require 'pg'
require 'puppet'
raise Puppet::ParseError, "Missing hostname." if args.empty?
hosts = args.flatten
begin
conn = PGconn.open(:dbname => 'puppet', :user => 'postgres')
hosts.each do |host|
Puppet.notice("Expiring resources for host: #{host}")
conn.exec("SELECT id FROM hosts WHERE name =
↪\'#{host}\'") do |host_id|
raise "Too many hosts" if host_id.ntuples > 1
conn.exec("SELECT id FROM param_names WHERE name =
↪'ensure'") do |param_id|
conn.exec("SELECT id FROM resources WHERE host_id =
↪#{host_id.values.flatten[0].to_i}") do |results|
resource_ids = []
results.each do |row|
resource_ids << Hash[*row.to_a.flatten]
end
resource_ids.each do |resource|
conn.exec("UPDATE param_values SET VALUE =
↪'absent' WHERE resource_id = #{resource['id']} AND
↪param_name_id = #{param_id.values}")
end
end
end
end
end
rescue => e
Puppet.notice(e.message)
ensure
conn.close
end
end
Listing 10. nagios/lib/puppet/parser/functions/purge_exported.rb
# This function will be used by the exported
# resources collector (the nagios box)
Puppet::Parser::Functions::newfunction(:purge_exported,
↪:doc => "delete expired resources.") do |args|
require 'rubygems'
require 'pg'
require 'puppet'
raise Puppet::ParseError, "Missing hostname." if args.empty?
hosts = args.flatten
begin
conn = PGconn.open(:dbname => 'puppet', :user => 'postgres')
hosts.each do |host|
Puppet.notice("Purging expired resources for host: #{host}")
conn.exec("SELECT id FROM hosts WHERE name =
↪\'#{host}\'") do |host_id|
raise "Too many hosts" if host_id.ntuples > 1
conn.exec("SELECT id FROM resources WHERE host_id =
↪#{host_id.values.flatten[0].to_i}") do |results|
resource_ids = []
results.each do |row|
resource_ids << Hash[*row.to_a.flatten]
end
resource_ids.each do |resource|
conn.exec("DELETE FROM param_values WHERE
↪resource_id = #{resource['id']}")
conn.exec("DELETE FROM resources WHERE id =
↪#{resource['id']}")
end
end
conn.exec("DELETE FROM hosts WHERE id =
↪#{host_id.values}")
end
end
rescue => e
Puppet.notice(e.message)
ensure
conn.close
end
end
And, now for the refactored nagios class and related code (Listings 11–14).
Listing 11. modules/nagios/manifests/init.pp
# This class will be used by the nagios server
class nagios {
include nagios::params
require nagios::expire_resources
include nagios::purge_resources
service { $nagios::params::service:
ensure => running,
enable => true,
}
# nagios.cfg needs this specified via the cfg_dir directive
file { $nagios::params::resource_dir:
ensure => directory,
owner => $nagios::params::user,
}
# Local Nagios resources
nagios::resource { [ 'Nagios Servers', 'Puppet Servers', 'Other' ]:
type => hostgroup,
export => false;
}
}
Listing 12. modules/nagios/manifests/expire_resources.pp
class nagios::expire_resources {
if $my_nagios_purge_hosts {
expire_exported($my_nagios_purge_hosts)
}
}
Listing 13. modules/nagios/manifests/purge_resources.pp
class nagios::purge_resources {
require nagios::collect_resources
if $my_nagios_purge_hosts {
purge_exported($my_nagios_purge_hosts)
}
}
Listing 14. modules/nagios/manifests/collect_resources.pp
class nagios::collect_resources {
include nagios::params
Nagios_host <<||>> {
require => $nagios::params::resource_dir,
notify => Service[$nagios::params::service],
}
File <<| tag == nagios_host |>> {
notify => Service[$nagios::params::service],
}
}
The basic building blocks are now in place. Extend nagios::resources, plug the classes in to your nagios module and kick back. If a node goes MIA and needs to be purged, toss it into your $my_nagios_purge_hosts array and be done with it. Until next time, may your Nagios dashboards be green and your alerts be few.
- « first
- ‹ previous
- 1
- 2
- 3
- 4
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Validate an E-Mail Address with PHP, the Right Way
- New Products
- Developer Poll
- Why Python?
- not living upto the mobile revolution
2 hours 13 min ago - Deceptive Advertising and
2 hours 49 min ago - Let\'s declare that you have
2 hours 50 min ago - Alterations in Contest Due
2 hours 51 min ago - At a numbers mindset, your
2 hours 52 min ago - Do not get Just Almost any
2 hours 56 min ago - A fantastic rule-of-thumb to
2 hours 57 min ago - Keren mastah..
Penting,
3 hours 55 min ago - mini tablet compare
5 hours 14 min ago - Looking Good
8 hours 47 min ago



Comments
Re:
purge logic is cumbersome... set recurse, purge, force on resource dir, and expire your nodes in puppet properly (puppet node clean/deactivate foo) and you'll achieve the same ACHO
file ownership is not a problem
"Before we begin, let's make sure we understand the most important problem—the issue of file ownership and permissions for the newly generated .cfg files. Because these files are created via the target parameter of each associated Nagios type, they'll be written to disk by the user Puppet runs as. This means they will be owned by the root user/group, and Nagios will not have permission to read them (because I know you are not running Nagios as root, correct?)."
I don't get this.
On a default Ubuntu 12.04 machine, Nagios3 runs as user nagios. All .cfg files are are owned by root. Nagios is just fine with that. Doesn't even complain in the logs about it.
Even if it weren't, couldn't I just chown nagios:nagios /etc/nagios3/conf.d and chmod g+s /etc/nagios3/conf.d? This would ensure all newly created files in /etc/nagios3/conf.d/ were owned by the nagios group, of which user nagios is a member.
I don't understand how the filepermissions are the 'most important problem' in this.
Turns out it's not so much
Turns out it's not so much the ownership that is the problem, but permissions. Puppet creates the files with a 0600 mode, making it unreadable for Nagios.
purge logic is cumbersome...
purge logic is cumbersome... set recurse, purge, force on resource dir, and expire your nodes in puppet properly (puppet node clean/deactivate foo) and you'll achieve the same.
Reply to comment | Linux Journal
I am no longer positive the place you are getting your information, but great topic.
I must spend some time finding out more or figuring out
more. Thanks for great info I used to be in search of this information for
my mission.
Reply to comment | Linux Journal
Thаnks fοr fіnally wrіtіng about >
Reply to comment | Linux Journal < Liked it!