By tracking information over time, you can get an overall idea of the general health of the network and its services. These tools will show you network trends and even notify a human when problems present themselves. More often than not, the systems will notice trouble before a person has a chance to call tech support.
- cacti (www.cacti.net). As mentioned earlier, many tools use RRDtool as a back-end to build graphs for data that they collect. Cacti is such a tool. It is a PHP-based network management tool that simplifies data gathering and graph generation. It stores its configuration in a MySQL database, and is integrated with SNMP. This makes it very straightforward to map out all of the devices on your network, and monitor everything from network flows to CPU load. Cacti has an extensible data collection scheme that lets you collect just about any kind of data you can think of (such as radio signal, noise, or associated users) and plot it on a graph over time. Thumbnail views of your graphs can be combined into a single web page. This lets you observe the overall state of your network at a glance.
- SmokePing (people.ee.ethz.ch/~oetiker/webtools/smokeping/). Yet another tool by Tobias Oetiker, SmokePing is a tool written in Perl that shows packet loss and latency on a single graph. It is very useful to run SmokePing on a host with good connectivity to your entire network. Over time, trends are revealed that can point to all sorts of network problems. Combined with MRTG or Cacti, you can observe the effect that network congestion has on packet loss and latency. SmokePing can optionally send alerts when certain conditions are met, such as when excessive packet loss is seen on a link for an extended period of time.
- Nagios (www.nagios.org). Nagios is a service monitoring tool. In addition to tracking the performance of simple pings (as with SmokePing), Nagios can watch the performance of actual services on any number of machines. For example, it can periodically query your web server, and be sure that it returns a valid web page. If a check should fail, Nagios can notify a person or group via email, SMS, or IM.
While Nagios will certainly help a single admin to monitor a large network, Nagios is best used when you have a troubleshooting team with responsibilities divided between various members. Trouble events can be configured to ignore transient problems, then escalate notifications only to people who are responsible for fixing them. If the problem goes on for a predefined period of time without being acknowledged, other people can additionally be notified. This allows temporary problems to be simply logged without bothering people, and for real problems to be brought to the attention of the team.