Berkshire PC Watchdog

The board can monitor a PC's activity in several ways to determine if it has locked up.
Setting Up the System rc Files

The module must be loaded, and the watchdog daemon started before the file systems are fsck-ed. Fsck-ing the file systems can take longer than the delay built into the Watchdog board. I put the following commands to load the module and start the daemon in my /etc/rc.d/rc.S file (Slackware initialization files) before the file systems are checked.

# load the watchdog module and
# start the watchdog daemon
if [ -x /lib/modules/2.0.28/misc/pcwd.o ]; then
  echo -loading watchdog module'
  /sbin/insmod 'f /lib/modules/2.0.28/misc/pcwd.o
if [ -x /usr/sbin/watchdog ]; then
  echo -starting watchdog daemon-
  /usr/sbin/watchdog -t 10 &
fi

At this time, the root file system is mounted read-only so depmod cannot be run to build the modules.dep file. Therefore, kerneld won't be able to load the watchdog module when a new kernel is installed.

A generic link to the module directory can't be made at this time either; therefore, the full path name to the module must be used here. The path to the module must be updated when a new kernel is installed to insure that an old module is not loaded.

Testing

I tested the board by killing the watchdog daemon and running a program that forked until the process table was full. The system did not experience any failures on its own during testing.

The PC Watchdog can also monitor the temperature of the machine, although the kernel driver does not support reading the temperature. I wrote a short program to read and print the temperature reported by the board (see Listing 3). As I heated the board with a hair dryer, my program reported the rising temperature and the board started beeping an alarm when the temperature reached 56 degrees Celsius. The board does have an option to hold the PC in a reset state when the temperature exceeds 60 degrees Celsius by closing a relay. A daemon could be written to send e-mail or call a pager when the temperature gets too high or to shut down the PC.

Comparison with Other Products

Industrial Computer Source makes the WDT Watchdog Timer Hardware board, for which there is also a Linux kernel driver. It's available from Industrial Computer Source (619-677-0877 in the USA, 01-243-533900 in the UK, and (1) 69.18.74.30 France). It appears similar to the PC Watchdog board, though I've not used it.

A software watchdog driver is also available for the Linux kernel. The software watchdog cannot reboot the system from some lockups nor does it have a temperature sensor. The hardware boards should reboot the system after any lockup.

Conclusion

The PC Watchdog is a well-designed, well-made board. During my three weeks of testing, it operated dependably. The board never reset the PC unnecessarily, and it never failed to reset the machine when needed.

Berkshire Products

David Walker is Linux/Unix System Administrator and Programmer living near Seattle, Washington. When he isn't working he likes to play with Linux, hike or ride horses in the mountains. He can be reached at dwalker@eskimo.com.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix