Big Brother Network Monitoring System

Figure 1. Big Brother (Sean MacGuire) is Watching
I wasn't bored: I don't have time to be bored. Texas Agricultural Extension Service operates a fairly large enterprise-wide network that stretches across hell's half acre, otherwise known as Texas. We have around 3,000 users in 249 counties and 12 district offices who expect to get their e-mail and files across our Wide Area Network. Some users actually expect the network to work most of the time. We use Ethernet networking with Novell servers at some 35 locations, about 15 have routers that are connected via a mixture of 56Kb circuits, fractional T1, frame-relay and radio links. We are not currently using barbed wire fences for our network, no matter what you may have heard.
I am privileged to be part of the team that set up and maintains the network. We do not live in a perfect network world—things happen. Scarcely a day goes by that we do not have one or more WAN link outages, usually of short duration. We sometimes have our hands full just keeping all the pieces connected. Did I mention that the users expect the mail and other software to actually work?
Cruising the USENET newsgroups, I read a posting about “Big Brother, a solution to the problem of Unix Systems Monitoring” written by Sean MacGuire of Montréal, Canada. I was intrigued to notice that Big Brother was a collection of shell scripts and simple C programs designed to monitor a bunch of Unix machines on a network. So what if most of our mission critical servers were Novell-based? Who cares if some of our web servers run on Macintosh, OS/2, Windows 95 or NT? We use both Linux and various flavors of Unix in a surprisingly large number of places.
System administrators often reported difficult installations and software incompatibilities with the monitoring software; thus, frustrated users often gave us our first hint that all was not well. We had cooked up a number of homemade monitoring systems; pinging and tracerouting to all the servers can be very informative. We even looked at a bunch of proprietary (and expensive) network monitoring systems. It is amazing how much money these systems can cost.
According to the blurb by Sean MacGuire on Big Brother:
Big Brother is a loosely-coupled distributed set of tools for monitoring and displaying the current status of an entire Unix network and notifying the system administrator should need be. It came about as the result of automating the day to day tasks encountered while actively administering Unix systems.
The USENET news article provided a URL to the home site of Big Brother, http://www.iti.qc.ca/iti/users/sean/bb-dnld/. I pointed my browser to it and was rewarded with a blue image of a sinister face peering out under the caption “big brother is watching ”against a purple background. After my initial shock, I learned that Big Brother featured:
Web-based status display
Configurable warning and panic levels
Notification via pager or e-mail
Free and included source code
I was fascinated, especially by the last item: “Free and includes source code.” (I often tell people that Linux isn't free, but priceless.) So what could a priceless package do for me? What does Big Brother check?
Connectivity via ping
HTTP servers up and running
Disk space usage
Uptime and CPU usage
Essential processes still running
System-generated messages and warnings
Overall, very sensible. Looking for some “gotchas”, I found I would need a Unix-based machine, a functioning web server and browser (for the display), a compiler, Kermit and a modem line (for the pager). A web server was no problem, as we run many. A C compiler came with Linux, and we use Kermit on many machines with modems. So far, so good.
The Big Brother web site provided links to a few demonstration sites, and a link to download the program as well. I connected to a demonstration site and was greeted with an amazing display:

Figure 2
Legend [BIG BROTHER IMAGE] [help]
[grn] System OK [BIG BROTHER IMAGE] [info]
[yel] Attention [BIG BROTHER IMAGE] [page]
[red] Trouble [BIG BROTHER IMAGE] [view]
[blu] No report [BIG BROTHER IMAGE]
Updated @ 22:52 [BIG BROTHER IMAGE]
conn cpu disk http msgs procs
iti-s01 [grn] [grn] [grn] [grn] [yel] [grn]
route-r-000 [grn] - -
- - -
inet-gw-0 [grn] - -
- - -
As you can see, Big Brother is watching. While enduring the scrutiny of the Orwellian face peering out at me, I examined the rest of the display. It is colored like a traffic signal (green/yellow/red), and the update time is clearly displayed beneath it. To the right of “Big Brother” are four buttons, marked clearly Help, Info, Page and View. Beneath the header area is a table with six column headings and three rows, each neatly labelled with a computer host name. The boxes formed by the intersection of the rows and columns contain attractive green and yellow balls. The overall effect is like a decorated tree. The left side of the screen has a yellow tint, gradually becoming black at the center.
Selecting the Help button gives a brief explanation of Big Brother. Choosing the Info Button provides a much longer and more detailed explanation of the system, including a graphic that really is worth a thousand words. The Page button sends a signal to a radio-linked pager—not at all what I had expected. Finally, the View selection provides a brief but perhaps more useful view of the information, isolating only the systems with problems.
In my case, only the “iti-s01” system was displayed. My browser cursor indicated a link as it passed over each colored dot, so I clicked on the blinking yellow dot and received this message:
yellow Tue Feb 18 22:50:53 EST 1997 Feb 16 12:22:33 iti-s01 kernel: WARNING: / was not properly dismounted
This puzzled me at first. How on earth could it know that? It turns out that Big Brother (BB) checks the system /var/log/messages file periodically and alerts on any line that begins with either WARNING or NOTICE. As I am certain Sean MacGuire is very conscientious, I suspect he adds that line to his message file, so the viewer can see how Big Brother reports its findings.
Suddenly, my screen spontaneously updated. The update time had changed by five minutes, and a blinking yellow dot appeared under the column labelled procs. I clicked on the blinking yellow dot and was informed that the sendmail process was not running. This got me really interested—Big Brother can monitor whether selected processes are running.
Being a little puzzled about the screen's ability to update itself, I viewed the document source and discovered some HTML commands that were new to me:
<META HTTP-EQUIV="REFRESH" CONTENT="120"> <META HTTP-EQUIV="EXPIRES" CONTENT="Tue Feb 18 23:22:07 CST 1997">
The first META line instructs browsers to get an update every 120 seconds. The second tells the browser to get a new copy after the expiration time and date—very clever.
I returned to the graphics window and discovered that the yellow area on the left had changed to red. A new host name row appeared with a blinking red dot under the column labelled conn. I clicked on the blinking red dot and read this message:
red Tue Feb 18 22:59:11 CST 1997 bb-network.sh: Can't connect to router''000... (paging)
The connection to the machine called router-000 had been interrupted, and the administrator had been paged. Amazingly, while in Texas, I had become aware of a network outage in Montréal, Canada. This really had possibilities—perhaps someday I may get to take a vacation.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
- Build a Skype Server for Your Home Phone System
- Why Python?
- A Topic for Discussion - Open Source Feature-Richness?
- Reply to comment | Linux Journal
24 min 54 sec ago - Not free anymore
4 hours 26 min ago - Great
8 hours 13 min ago - Reply to comment | Linux Journal
8 hours 21 min ago - Understanding the Linux Kernel
10 hours 36 min ago - General
13 hours 6 min ago - Kernel Problem
23 hours 9 min ago - BASH script to log IPs on public web server
1 day 3 hours ago - DynDNS
1 day 7 hours ago - Reply to comment | Linux Journal
1 day 7 hours ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
rtzrtzhrhfghfghh
rtzrtzhrhfghfghh