Product of the Day: NetVigil -- Monitoring Large Linux Environemnts

October 26th, 2004 by Vendor Written in

The following information has been provided by the product vendor and does not necessarily reflect the opinion of Linux Journal.

Product: NetVigil - Network Management/Monitoring

Manufacturer: Fidelia Technology, Inc.

Address: 300 Alexander Road #205, Princeton, New Jersey 08540

Telephone: 888-FIDELIA

A Linux Network Management Tool for Any Size Network

As Linux grows in use in a wide variety of network configurations, you still have to monitor the performance of your creation. Somebody has to keep an eye on your work of art to makes sure it performs flawlessly. Fidelia's NetVigil provides integrated fault and performance monitoring of applications, networks, systems and user defined data sources. NetVigil is a very scalable and easy to use product that is being used in very large network and server environments, replacing several legacy management platforms and "swivel chair" management with a single product.

Fidelia NetVigil has several features which are especially useful in large Linux and network environments. The BVE reporting engine has a parallel-processing correlation layer which can talk to distributed Data Gathering Engines simultaneously, process all this distributed data in real-time and present it to the user. Each DGE can monitor up to a thousand servers, and a single location can have many DGEs. In a large sample deployment, a DGE was deployed in each of 5 geographically separate datacenters, and each DGE kept the historical performance data in a local database, thus reducing the size of each database as well as eliminating traffic to a centralized data warehouse.

The BVE also has a built in security model, which controls access to the data in the distributed databases. This lends itself well in enterprises where different departments need access to the monitored data, but still want ownership over the alerts and reporting for the devices they are responsible for. As an example, the network group can have their own "virtual NMS" and the server group can have their separate "virtual NMS". At the same time, the NOC can have their own view across both the isolated departments and help in pin-pointing the exact cause of an outage or performance degradation.

Highlighting the Impact on Service

Additionally, the newer generation of server and network management software tend to highlight the impact of a server or router failing on the delivery of service . As an example, an eCommerce service might consist of a front-end application server connected to a backend remote database via a routed network. The failure of a router might not impact the eCommerce service if there is an alternate backup path, but the failure of the backend database will definitely impact the eCommerce service. It is important for a server or network management system to identify whether a failed router impacts any service or not, and conversely, what is the cause of a failed eCommerce service.

New network and server management systems such as Fidelia NetVigil offer this advanced capability and help in identifying the root cause of service failure. With increasing redundancy built into server environments, it is important that the operations staff gets distinct alerts for critical servers versus redundant servers. Most management systems do not make this distinction, with the end result that the same operational priority is given to a non-critical server and a critical server. An operator should be able to look at a screen and determine which server or router is more important and has greater impact to business service downtime.

Pro-Active Real-time Reporting

When trying to monitor a very large number of servers, including network devices and applications, it is important to rely on the management system to identify the problems as well as analyze the vast volume of data collected. Trying to manually go through thousands of graphs trying to find potential bottlenecks  is simply not possible in a large environment.

Fidelia NetVigil offers integrated fault and performance monitoring of servers and network equipment in a single product. Its distributed architecture allows it to be deployed in very large and distributed datacenter environments, consisting of tens of thousands of servers. The intelligent DGEs handle all the processing and storage of data, as well as analyzing and triggering alarms. Data analysis and reporting is very very fast because of the distributed data processing model. Historical data is stored for several years (like MRTG) and the reporting engine can process this data and pro-actively highlight disks that are close to filling up, or network links that are congested.

Preventing Alarm Floods

NetVigil's alarms are triggered by thresholds set by the administrator, and more interestingly, automatically calculating intelligent thresholds based on historical data. One common problem in large networks is alarm floods- a server fails and the administrator gets an alarm for all the different services running on the server, or a router fails and triggers hundreds of alarms for the unreachable devices.

NetVigil automatically detects and stores L2/L3 topology, and also understands the relationships between the different protocol stacks. The intelligent notification engine automatically suppresses alarms for cascading events based on topology relationships between devices and uses complex heuristics to avoid false suppressions. It can reduce hundreds of related alarms to a single notification, effectively reducing Mean Time To Recover which is critical to your business.

Fidelia's Net Vigil has been installed at Verizon Wireless, SONY Entertainment, Yale University and a host of other Universities in the N.E. USA. To learn more about this useful management tool read one of their white papers - www.fiedlia.com/products/whitepapers.php

__________________________

Featured Video

Shawn Powers has a message for hardware vendors. Listen up!

From the Magazine

July 2008, #171

Heard of the Web? If not, read on. This month we talk with Matt Mullenweg about WordPress. If you want to get your hands dirty in Web code, take a look at the rest of our feature articles on WebKit, Dojo and OpenLaszlo.

In the rest of the issue, you'll find articles on OpenID, RDFa and Quanta Plus. Kyle Rankin puts a new spin (as in "no" spin SSD) on hard drives and also tells you how to migrate to that new disk (spinning or not). Mick Bauer continues his series on customizing live CD's. And, James Gray gives us a feel for the state of Linux in the enterprise. After all that, you may need some TV time. If so, check out our review on how to make that digital TV tuner card work in your Linux box.

Read this issue