Puppet and Nagios: a Roadmap to Advanced Configuration
Expire, Collect and Purge Exported Resources
Up until this point, the job of our Nagios server simply has been to collect exported resources. In the real world, the nodes it monitors are retired for one reason or another quite routinely. When a node is retired, I want to be sure the relevant Nagios objects are removed and the corresponding database records are deleted. According to Puppet's documentation, these resources can be purged from the collector only when default target locations are leveraged (http://docs.puppetlabs.com/references/stable/type.html#nagioshost). Even so, I wasn't happy to see orphaned database records left behind and decided to address this issue with a few Puppet functions and some basic class ordering. Before we dive in, some work flow and terminology must be understood:
-
Expire: a Nagios resource is "expired" by setting the value of its "ensure" parameter to "absent".
-
Collect: the resource is removed from the collector due to the value of its "ensure" parameter.
-
Purge: all database records associated with the expired host are deleted.
Ordering is obviously a big deal here. In order to ensure proper execution of each task, we will break out each unit of work into its own class and use a mix of "include" and "require" functions. Using Puppet terminology, we now can express this "expire, collect, then purge" work flow as follows:
-
The nagios class requires the nagios::expire_resources class.
-
The nagios class includes the nagios::purge_resources class.
-
The nagios::purge_resources class requires the nagios::collect_resources class.
Now, let's look at a few custom functions, expire_exported and purge_exported. These functions (written for PostgreSQL) perform the database operations that are required in order to expire hosts and their resources. They both operate on a node-scoped variable named $my_nagios_purge_hosts, which should contain an array of hostnames. If used, this variable should be placed somewhere in your Nagios server's node definition. For example:
node corona {
$my_nagios_purge_hosts = [ 'foo', 'bar', 'baz' ]
include nagios
}
With this node-scoped variable defined, your (affectionately named) Nagios server will reconfigure itself after dropping all resources for the three hosts mentioned above (Listings 9 and 10).
Listing 9. nagios/lib/puppet/parser/functions/expire_exported.rb
Puppet::Parser::Functions::newfunction(
:expire_exported,
:doc => "Sets a host's resources to ensure =>
↪absent as part of a purge work-flow.") do |args|
require 'rubygems'
require 'pg'
require 'puppet'
raise Puppet::ParseError, "Missing hostname." if args.empty?
hosts = args.flatten
begin
conn = PGconn.open(:dbname => 'puppet', :user => 'postgres')
hosts.each do |host|
Puppet.notice("Expiring resources for host: #{host}")
conn.exec("SELECT id FROM hosts WHERE name =
↪\'#{host}\'") do |host_id|
raise "Too many hosts" if host_id.ntuples > 1
conn.exec("SELECT id FROM param_names WHERE name =
↪'ensure'") do |param_id|
conn.exec("SELECT id FROM resources WHERE host_id =
↪#{host_id.values.flatten[0].to_i}") do |results|
resource_ids = []
results.each do |row|
resource_ids << Hash[*row.to_a.flatten]
end
resource_ids.each do |resource|
conn.exec("UPDATE param_values SET VALUE =
↪'absent' WHERE resource_id = #{resource['id']} AND
↪param_name_id = #{param_id.values}")
end
end
end
end
end
rescue => e
Puppet.notice(e.message)
ensure
conn.close
end
end
Listing 10. nagios/lib/puppet/parser/functions/purge_exported.rb
# This function will be used by the exported
# resources collector (the nagios box)
Puppet::Parser::Functions::newfunction(:purge_exported,
↪:doc => "delete expired resources.") do |args|
require 'rubygems'
require 'pg'
require 'puppet'
raise Puppet::ParseError, "Missing hostname." if args.empty?
hosts = args.flatten
begin
conn = PGconn.open(:dbname => 'puppet', :user => 'postgres')
hosts.each do |host|
Puppet.notice("Purging expired resources for host: #{host}")
conn.exec("SELECT id FROM hosts WHERE name =
↪\'#{host}\'") do |host_id|
raise "Too many hosts" if host_id.ntuples > 1
conn.exec("SELECT id FROM resources WHERE host_id =
↪#{host_id.values.flatten[0].to_i}") do |results|
resource_ids = []
results.each do |row|
resource_ids << Hash[*row.to_a.flatten]
end
resource_ids.each do |resource|
conn.exec("DELETE FROM param_values WHERE
↪resource_id = #{resource['id']}")
conn.exec("DELETE FROM resources WHERE id =
↪#{resource['id']}")
end
end
conn.exec("DELETE FROM hosts WHERE id =
↪#{host_id.values}")
end
end
rescue => e
Puppet.notice(e.message)
ensure
conn.close
end
end
And, now for the refactored nagios class and related code (Listings 11–14).
Listing 11. modules/nagios/manifests/init.pp
# This class will be used by the nagios server
class nagios {
include nagios::params
require nagios::expire_resources
include nagios::purge_resources
service { $nagios::params::service:
ensure => running,
enable => true,
}
# nagios.cfg needs this specified via the cfg_dir directive
file { $nagios::params::resource_dir:
ensure => directory,
owner => $nagios::params::user,
}
# Local Nagios resources
nagios::resource { [ 'Nagios Servers', 'Puppet Servers', 'Other' ]:
type => hostgroup,
export => false;
}
}
Listing 12. modules/nagios/manifests/expire_resources.pp
class nagios::expire_resources {
if $my_nagios_purge_hosts {
expire_exported($my_nagios_purge_hosts)
}
}
Listing 13. modules/nagios/manifests/purge_resources.pp
class nagios::purge_resources {
require nagios::collect_resources
if $my_nagios_purge_hosts {
purge_exported($my_nagios_purge_hosts)
}
}
Listing 14. modules/nagios/manifests/collect_resources.pp
class nagios::collect_resources {
include nagios::params
Nagios_host <<||>> {
require => $nagios::params::resource_dir,
notify => Service[$nagios::params::service],
}
File <<| tag == nagios_host |>> {
notify => Service[$nagios::params::service],
}
}
The basic building blocks are now in place. Extend nagios::resources, plug the classes in to your nagios module and kick back. If a node goes MIA and needs to be purged, toss it into your $my_nagios_purge_hosts array and be done with it. Until next time, may your Nagios dashboards be green and your alerts be few.
- « first
- ‹ previous
- 1
- 2
- 3
- 4
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- Non-Linux FOSS: libnotify, OS X Style
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- RSS Feeds
- It is quiet helping
14 min 20 sec ago - Technology
31 min 24 sec ago - Reachli - Amplifying your
1 hour 47 min ago - excellent
2 hours 36 min ago - good point!
2 hours 39 min ago - Varnish works!
2 hours 48 min ago - Reply to comment | Linux Journal
3 hours 18 min ago - Reply to comment | Linux Journal
5 hours 44 min ago - Reply to comment | Linux Journal
9 hours 43 min ago - Yeah, user namespaces are
11 hours 14 sec ago



Comments
Re:
purge logic is cumbersome... set recurse, purge, force on resource dir, and expire your nodes in puppet properly (puppet node clean/deactivate foo) and you'll achieve the same ACHO
file ownership is not a problem
"Before we begin, let's make sure we understand the most important problem—the issue of file ownership and permissions for the newly generated .cfg files. Because these files are created via the target parameter of each associated Nagios type, they'll be written to disk by the user Puppet runs as. This means they will be owned by the root user/group, and Nagios will not have permission to read them (because I know you are not running Nagios as root, correct?)."
I don't get this.
On a default Ubuntu 12.04 machine, Nagios3 runs as user nagios. All .cfg files are are owned by root. Nagios is just fine with that. Doesn't even complain in the logs about it.
Even if it weren't, couldn't I just chown nagios:nagios /etc/nagios3/conf.d and chmod g+s /etc/nagios3/conf.d? This would ensure all newly created files in /etc/nagios3/conf.d/ were owned by the nagios group, of which user nagios is a member.
I don't understand how the filepermissions are the 'most important problem' in this.
Turns out it's not so much
Turns out it's not so much the ownership that is the problem, but permissions. Puppet creates the files with a 0600 mode, making it unreadable for Nagios.
purge logic is cumbersome...
purge logic is cumbersome... set recurse, purge, force on resource dir, and expire your nodes in puppet properly (puppet node clean/deactivate foo) and you'll achieve the same.
Reply to comment | Linux Journal
I am no longer positive the place you are getting your information, but great topic.
I must spend some time finding out more or figuring out
more. Thanks for great info I used to be in search of this information for
my mission.
Reply to comment | Linux Journal
Thаnks fοr fіnally wrіtіng about >
Reply to comment | Linux Journal < Liked it!