Berkshire PC Watchdog

The board can monitor a PC's activity in several ways to determine if it has locked up.
Setting Up the System rc Files

The module must be loaded, and the watchdog daemon started before the file systems are fsck-ed. Fsck-ing the file systems can take longer than the delay built into the Watchdog board. I put the following commands to load the module and start the daemon in my /etc/rc.d/rc.S file (Slackware initialization files) before the file systems are checked.

# load the watchdog module and
# start the watchdog daemon
if [ -x /lib/modules/2.0.28/misc/pcwd.o ]; then
  echo -loading watchdog module'
  /sbin/insmod 'f /lib/modules/2.0.28/misc/pcwd.o
if [ -x /usr/sbin/watchdog ]; then
  echo -starting watchdog daemon-
  /usr/sbin/watchdog -t 10 &
fi

At this time, the root file system is mounted read-only so depmod cannot be run to build the modules.dep file. Therefore, kerneld won't be able to load the watchdog module when a new kernel is installed.

A generic link to the module directory can't be made at this time either; therefore, the full path name to the module must be used here. The path to the module must be updated when a new kernel is installed to insure that an old module is not loaded.

Testing

I tested the board by killing the watchdog daemon and running a program that forked until the process table was full. The system did not experience any failures on its own during testing.

The PC Watchdog can also monitor the temperature of the machine, although the kernel driver does not support reading the temperature. I wrote a short program to read and print the temperature reported by the board (see Listing 3). As I heated the board with a hair dryer, my program reported the rising temperature and the board started beeping an alarm when the temperature reached 56 degrees Celsius. The board does have an option to hold the PC in a reset state when the temperature exceeds 60 degrees Celsius by closing a relay. A daemon could be written to send e-mail or call a pager when the temperature gets too high or to shut down the PC.

Comparison with Other Products

Industrial Computer Source makes the WDT Watchdog Timer Hardware board, for which there is also a Linux kernel driver. It's available from Industrial Computer Source (619-677-0877 in the USA, 01-243-533900 in the UK, and (1) 69.18.74.30 France). It appears similar to the PC Watchdog board, though I've not used it.

A software watchdog driver is also available for the Linux kernel. The software watchdog cannot reboot the system from some lockups nor does it have a temperature sensor. The hardware boards should reboot the system after any lockup.

Conclusion

The PC Watchdog is a well-designed, well-made board. During my three weeks of testing, it operated dependably. The board never reset the PC unnecessarily, and it never failed to reset the machine when needed.

Berkshire Products

David Walker is Linux/Unix System Administrator and Programmer living near Seattle, Washington. When he isn't working he likes to play with Linux, hike or ride horses in the mountains. He can be reached at dwalker@eskimo.com.

______________________

Geek Guide
The DevOps Toolbox

Tools and Technologies for Scale and Reliability
by Linux Journal Editor Bill Childers

Get your free copy today

Sponsored by IBM

Upcoming Webinar
8 Signs You're Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th
Moderated by Linux Journal Contributor Mike Diehl

Sign up now

Sponsored by Skybot