Chapter 4: Nagios Basics
The fact that a host can be reached, in itself, has little meaning if no service is running on it on which somebody or something relies. Accordingly, everything in Nagios revolves around service checks. After all, no service can run without a host. If the host computer fails, it also cannot provide the desired service.
Things get slightly more complicated if a router, for example, is brought into play, which lies between users and the system providing services. If this fails, the desired service may still be running on the target host, but it is nevertheless no longer reachable for the user.
Nagios is in a position to reproduce such dependencies and to precisely inform the administrator of the failure of an important network component, instead of flooding the administrator with irrelevant error messages concerning services that cannot be reached. An understanding of such dependencies is essential for the smooth operation of Nagios, which is why Section 4.1 will look in more detail at these dependencies and the way Nagios works.
Another important item is the state of a host or service. On the one hand Nagios allows a much finer distinction than just "ok" or "not ok"; on the other hand the distinction between (soft state) and (hard state) means that the administrator does not have to deal with short-term disruptions that have long since disappeared by the time the administrator has received the information. These states also influence the intensity of the service checks. How this functions in detail is described in Section 4.3.
How Nagios handles dependencies of hosts and services can be best illustrated with an example. Figure 4.1 represents a small network in which the Domain Name Service on proxy is to be monitored.
The service check always serves as the starting point for monitoring that is regularly performed by the system. As long as the service can be reached, Nagios takes no further steps; that is, it does not perform any host checks. For switch1, switch2, and proxy, such a check would be pointless anyway, because if the DNS service responds to proxy, then the hosts mentioned are automatically accessible.
If the name service fails, however, Nagios tests the computer involved with a host check, to see whether the service or the host is causing the problem. If proxy cannot be reached, Nagios might test the parent hosts entered in the configuration (Figure 4.2). With the parents host parameter, the administrator has a means available to provide Nagios with information on the network topology.

Figure 4.2: The order of tests performed after a service failure.
When doing this, the administrator only enters the direct neighbor computer fo each host on the path to the Nagios server as the parent.1 Hosts that are allocated in the same network segment as the Nagios server itself are defined without a parent. For the network topology from Figure 4.1, the corresponding configuration (reduced to the host name and parent) appears as follows:
define host{
host_name proxy
...
parents switch2
}
define host{
host_name switch2
...
parents switch1
}
define host{
host_name switch1
...
}
switch1 is located in the same network segment as the Nagios server, so it is therefore not allocated a parent computer. What belongs to a network segment is a matter of opinion: if you interpret the switches as the segment limit, as is the case here, this has the advantage of being able to more closely isolate a disruption. But you can also take a different view and interpret an IP subnetwork as a segment. Then a router would form the segment limit; in our example, proxy would then count in the same network as the Nagios server. However, it would no longer be possible to distinguish between a failure of proxy and a failure of switch1 or switch2.
If switch1 in the example fails, Figure 4.3 shows the sequence in which Nagios proceeds: first the system, when checking the DNS service on proxy, determines that this service is no longer reachable (1). To differentiate, it now performs a host check to see what the state of the proxy computer is (2). Since proxy cannot be reached, but it has switch2 as a parent, Nagios also subjects switch2 to a host check (3). If this switch also cannot be reached, the system checks its parent, switch1 (4).
If Nagios can establish contact with switch1, the cause for the failure of the DNS service on proxy can be isolated to switch2. The system accordingly specifies the states of the host: switch1 is UP, switch2 DOWN; proxy, on the other hand, is UNREACHABLE. Through a suitable configuration of the Nagios messaging system (see Section 12.3 on page 217) you can use this distinction to determine, for example, that the administrator is informed only about the host that is in the DOWN state and represents the actual problem, but not about the hosts that are dependent on the down host.
In a further step, Nagios can determine other topology-specific failures in the network (so-called network outages). proxy is the parent of gate, so gate is also represented as UNREACHABLE (5). gate in turn also functions as a parent; the Internet server dependent on this is also classified as "UNREACHABLE".
This "intelligence", which distinguishes Nagios, helps the administrator all the more, the more hosts and services are dependent on a failed component. For a router in the backbone, on which hundreds of hosts and services are dependent, the system informs administrators of the specific disruption, instead of sending them hundreds of error messages that are not wrong in principle, but are not really of any help in trying to eliminate the disruption.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Speed Up Your Web Site with Varnish | Jun 19, 2013 |
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
- Speed Up Your Web Site with Varnish
- Containers—Not Virtual Machines—Are the Future Cloud
- RSS Feeds
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Weechat, Irssi's Little Brother
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Cari Uang
3 hours 8 min ago - user namespaces
6 hours 2 min ago - yea
6 hours 28 min ago - One advantage with VMs
8 hours 56 min ago - about info
9 hours 29 min ago - info
9 hours 30 min ago - info
9 hours 31 min ago - info
9 hours 33 min ago - info
9 hours 34 min ago - abut info
9 hours 36 min ago





Comments
Suggestion for the book
You might consider a quickstart guide in the book. Most people who purchase a book like this are interested in getting up and running, even in a minimal configuration, first... not memorizing a plethora of detail beforehand.
While manually going through the book, following step-by-step to configure nagios, the daemon complained because there were missing pieces such as defining 24x7 "somewhere" - that's not clearly explained. details like that which can throw a new reader off very easily.
Quote: Although the
Quote: Although the check_interval parameter provides a way of forcing regular host checks, there is no real reason to do this.
This is not true. Example: Mail Server serving up IMAP on port 143 goes DOWN due to having the power go out. When the machine gets turned back on the IMAP service is not turned on by default (or insert whatever scenario that would make the IMAP service non-functional now, iptables, hosts.deny, etc.). Nagios continues to check for port 143 listening on this server and NOT whether the machine responds or not. This machine will continue to show as DOWN as long as the service is non-responsive.
There are only two fixes that I have found for this. 1: Turn on aggressive_host_checking which will kill any machine with more than 1000 active service checks. 2. Use a host checking mechanism as a service. Preferably a quick one icmp packet check.
nice nagios tutorials
this is very easy installation and configuration for Nagios hope this will help more people installing nagios plugins and examples of how to use plugins