Product of the Day: NetVigil -- Monitoring Large Linux Environemnts
As Linux grows in use in a wide variety of network configurations, you still have to monitor the performance of your creation. Somebody has to keep an eye on your work of art to makes sure it performs flawlessly. Fidelia's NetVigil provides integrated fault and performance monitoring of applications, networks, systems and user defined data sources. NetVigil is a very scalable and easy to use product that is being used in very large network and server environments, replacing several legacy management platforms and "swivel chair" management with a single product.
Fidelia NetVigil has several features which are especially useful in large Linux and network environments. The BVE reporting engine has a parallel-processing correlation layer which can talk to distributed Data Gathering Engines simultaneously, process all this distributed data in real-time and present it to the user. Each DGE can monitor up to a thousand servers, and a single location can have many DGEs. In a large sample deployment, a DGE was deployed in each of 5 geographically separate datacenters, and each DGE kept the historical performance data in a local database, thus reducing the size of each database as well as eliminating traffic to a centralized data warehouse.
The BVE also has a built in security model, which controls access to the data in the distributed databases. This lends itself well in enterprises where different departments need access to the monitored data, but still want ownership over the alerts and reporting for the devices they are responsible for. As an example, the network group can have their own "virtual NMS" and the server group can have their separate "virtual NMS". At the same time, the NOC can have their own view across both the isolated departments and help in pin-pointing the exact cause of an outage or performance degradation.
Additionally, the newer generation of server and network management software tend to highlight the impact of a server or router failing on the delivery of service . As an example, an eCommerce service might consist of a front-end application server connected to a backend remote database via a routed network. The failure of a router might not impact the eCommerce service if there is an alternate backup path, but the failure of the backend database will definitely impact the eCommerce service. It is important for a server or network management system to identify whether a failed router impacts any service or not, and conversely, what is the cause of a failed eCommerce service.
New network and server management systems such as Fidelia NetVigil offer this advanced capability and help in identifying the root cause of service failure. With increasing redundancy built into server environments, it is important that the operations staff gets distinct alerts for critical servers versus redundant servers. Most management systems do not make this distinction, with the end result that the same operational priority is given to a non-critical server and a critical server. An operator should be able to look at a screen and determine which server or router is more important and has greater impact to business service downtime.
When trying to monitor a very large number of servers, including network devices and applications, it is important to rely on the management system to identify the problems as well as analyze the vast volume of data collected. Trying to manually go through thousands of graphs trying to find potential bottlenecks is simply not possible in a large environment.
Fidelia NetVigil offers integrated fault and performance monitoring of servers and network equipment in a single product. Its distributed architecture allows it to be deployed in very large and distributed datacenter environments, consisting of tens of thousands of servers. The intelligent DGEs handle all the processing and storage of data, as well as analyzing and triggering alarms. Data analysis and reporting is very very fast because of the distributed data processing model. Historical data is stored for several years (like MRTG) and the reporting engine can process this data and pro-actively highlight disks that are close to filling up, or network links that are congested.
NetVigil's alarms are triggered by thresholds set by the administrator, and more interestingly, automatically calculating intelligent thresholds based on historical data. One common problem in large networks is alarm floods- a server fails and the administrator gets an alarm for all the different services running on the server, or a router fails and triggers hundreds of alarms for the unreachable devices.
NetVigil automatically detects and stores L2/L3 topology, and also understands the relationships between the different protocol stacks. The intelligent notification engine automatically suppresses alarms for cascading events based on topology relationships between devices and uses complex heuristics to avoid false suppressions. It can reduce hundreds of related alarms to a single notification, effectively reducing Mean Time To Recover which is critical to your business.
Fidelia's Net Vigil has been installed at Verizon Wireless, SONY Entertainment, Yale University and a host of other Universities in the N.E. USA. To learn more about this useful management tool read one of their white papers - www.fiedlia.com/products/whitepapers.php