WANDER: a Portable Linux Data-Collection System
One of the most entertaining aspects of spending an otherwise exhausting decade conjuring a geeked-out, canoe-scale, Linux-based, amphibian pedal/solar/sail trimaran, is that every new twist in the project involves steep learning curves and, in many cases, spin-offs. Usually these manifest themselves as publications and other obvious ways of piping ideas back into the Open Source community that has done so much to make the Microship adventure possible, but occasionally something utterly unexpected falls out of the boat lab.
The WANDER Project certainly fits this category. A couple of years ago, I was contacted by Dave Hughes of the NSF Wireless Field Test Project and enjoined to ``clone'' the Microship core Linux system for use as a ruggedized field data-collection tool. This seemed like an easy and productive technology-transfer project, so I quickly agreed.
Naturally, it was not to be so simple; there was an almost immediate divergence between the boat system design and that of the WANDER box. The former was becoming more and more wrapped around a rich user interface that could migrate transparently among wireless handhelds running VNC clients, with applications ranging far beyond data collection to include active control, security and communications. The latter, meanwhile, was becoming ever more focused on the problem of deploying a flexible database-centric tool into harsh environments, scriptable by moderately technical end users, able to inhale readings from multiple sensor channels, associate them with time and GPS coordinates, and then eventually transmit accumulated data via Globalstar satellite phone. It also would have to be power-efficient enough to allow unattended solar operation, so WANDER took on a life of its own.
We wanted to allow the user (typically a scientist doing environmental field research) to install a variety of sensors and configure the system accordingly--a somewhat nontrivial problem, as we can't very well anticipate every arcane serial protocol or sensor characteristic that might be encountered. A data-collection process launches a collection task for each channel, which in turn stores a time- and location-stamped reading at specified intervals into a database (using Berkeley DB). This process can be started and stopped manually, via a cron job or under control of a separate microcontroller-based, power-control processor that can wake the system at arbitrary intervals. An LCD display on the front panel summarizes activity. All this can take place without the connection of standard peripheral devices, although connectors are included for keyboard, mouse and VGA display to simplify development and maintenance. It is also possible to connect to the unit with an Ethernet cable and gain full access via the LAN.
At any time, the database can be queried by means of a variety of methods, including transmission of accumulated results via FTP over the satellite link, sending same via e-mail or browsing through the unit's internal web server (with tabular or graphic display). The tools are standard, allowing researchers to create new utilities for examining and manipulating the results; the whole front end is implemented with a handful of CGI scripts, and all internals are written in Perl.
Having said all that, we also should note that this is primarily a development system; it is relatively large and heavy, and operates primarily through a browser interface. We envisioned the primary uses as being field application development, feasibility tests for data-collection systems, data concentration from other devices and a test platform for software that is subsequently ported into miniature sealed systems with wireless links to a host. Because it's all built on a standard embedded Linux platform, code developed on WANDER should be portable into tiny, cheap, field-deployable sensor nodes.
We wrapped the system around an industrial-grade 133MHz Octagon PC-500 single-board computer with loads of I/O capability, then packaged it inside a sealed Pelican case along with a battery management system, hard disk, support for external Globalstar satellite phone, internal Garmin-25 GPS with an antenna in the case lid, a simple menu-driven local user interface and an Ethernet port that supports laptops or LAN connection for detailed configuration or software development.
Survival in an outdoor environment defined the overall shape and feel of this box; this called for a gasketed Pelican case and sealed connectors. When the lid is closed, it can handle rain, dirt and high ambient moisture--although we wouldn't recommend total immersion or extended operation in a saltwater environment.
Figure 2. With the front panel hinged open, the innards are revealed. The Octagon PC-500 running Debian GNU/Linux is on the upper left; GPS and power control are on lower left.
Opening the box reveals a hinged silk-screened panel, carrying a small Matrix Orbital LCD and a 20-button Grayhill keypad, along with mini-DIN connectors for a PC keyboard and mouse, auxiliary serial port, video display, external power input and Ethernet. This panel in turn opens to reveal the internal hardware: the PC-500 card, 4.5GB IBM hard disk drive, a seven amp-hour sealed lead-acid battery, a Calex DC/DC converter that generates five volts and the custom power-management board. The latter is always alive and, in addition to handling battery charging from the external Solarex photovoltaic panel, it can send a brownout signal to the Linux board to allow graceful shutdown and reawaken the board when power returns (with suitable hysteresis to prevent flailing on and off, of course). This ``power control handshaking'' also allows the data system to shut itself down and schedule a return to life at any point in the future--useful for low-bandwidth data collection when power is scarce.
The Octagon PC-500 was chosen for this application because of its substantial suite of I/O hooks with human-scale connectors (compared, say, to a laptop board, which may be tempting for power-efficiency reasons but is a major pain to hack). It is based on a 133MHz 5x86 CPU, with 48MB of EDO RAM, a Flash filesystem, support for M-Systems Disk-On-Chip, APM-flavored power-saving options, floppy and hard disk ports, SCSI-2, Ethernet interface, flat panel and SVGA support, and efficient single-supply operation. The I/O includes five serial ports, a normal PC parallel port plus 24 lines of configurable digital I/O, and the endless variety of third-party options available via the PC/104 interface (this is not currently in use, but will become valuable if WANDER users wish to add analog inputs, signal conditioning, speech synthesis, relay outputs or whatever).
Now, let's take a look under the hood and see what it takes to make WANDER dance.
WANDER was built on a Debian ``unstable'' system with a 2.4.16 kernel. LILO manages the boot process; there is also the choice of booting to a DOS partition to manage some of the Octagon board settings.
Because there is 48MB of RAM available, we didn't have to be as concerned about memory footprint as we would have been for a smaller system. We were more concerned with making a system that is easy to customize and extend. Although the Octagon board has a socket for a Disk-On-Chip solid-state disk device, we decided not to use it because we needed the hard disk anyway for data storage. Also, the Linux MTD drivers didn't want to work with the DOC device on this board.
Before we discuss our database design, let's consider the basic data-collection requirements.
We need to be able to collect data simultaneously from a number of different channels. Some of these may be periodic sources with a fixed sampling rate (such as analog values). Other channels may provide nonperiodic data, like text notes, images, audio samples and switch-closure events. Both flavors of data are identified by a timestamp and channel ID. The actual data can range from one byte to several megabytes, and the timestamps require a one-second accuracy and resolution.
Our design depended on a single process storing the data and several other processes querying the data. This required a storage scheme that would allow a single writer and multiple readers to access the database. We also wanted a way to discard old data if necessary, perhaps after verifying its reception at a ``home base'' server via e-mail. Thus, one of the first design decisions was how to store the sampled data on disk so that we could get to it from multiple processes safely.
We considered a number of possibilities, from simple flat text files through relational databases. The latter were rejected early on because there are effectively no relations involved and because queries are relatively simple (usually requests for values of certain channels over a particular time range or for the latest value of a particular channel). The relational approach would be overkill.
Flat text files on the other hand, while easy to implement, would have been a pain to update. If a single such file were used for all the channels, it would be hard to get the last values for each one, and if multiple files (one-per-channel) were used, it would be time consuming to query for a range of timestamps.
We finally settled on the Berkeley DB package. Berkeley DB databases are dictionaries--sorted collections of key/value pairs. The keys and the values can each be up to 2Gb in length, which lets us store everything from single numbers to images or text files in the database.
Because our view of the data is based on sample times, the keys in the database are four-byte timestamps (with one-second resolution). The values themselves begin with a two-byte channel number, followed by the actual data, with numeric data stored as text. Using the Berkeley DB Btree table type, we can then do efficient searches for ranges of timestamps, as well as find the first or last ones quickly. Because the package supports duplicate keys, we can store different channels' data under the same timestamp.
For an embedded system, another advantage of Berkeley DB is that it doesn't require a separate server process, keeping the memory requirements low. It also handles the locking required by our single-writer, multiple-reader scenario, using shared memory segments.
Because we didn't know where the future development of WANDER would go, we wanted to make sure that the system was written so that it could be extended easily and have new sensor types installed--and because the system would likely be used in university research, we also wanted a language that was widely familiar to college students.
We thus chose Perl for our data collection and configuration programs. Part of this choice was pragmatic: a number of the harder parts of the job was already done for us by CPAN modules or extensible Perl programs, including Berkeley DB interface (BerkeleyDB), event kernel with timers and I/O triggering (Event), web server and system configuration (Webmin), serial port control (Device::SerialPort), SMTP mail transmission (Net::SMTP) and graph generation (Chart::Plot and GD).
Another reason for using Perl was its ability to evaluate program snippets at runtime. We use this to provide each channel with a small custom driver, which lets us add new channel types very easily from within the Webmin environment. These drivers can be as small as one line of Perl code.
At startup, the data collector reads a small Berkeley DB database (separate from the collected data) that contains configuration information for each channel. This configuration includes the name of a Perl script that is then evaluated to provide the channel object used for collection. The configuration data is available to these scripts as a dictionary of name/value pairs and is user-extensible using the configuration web interface.
The scripts that are evaluated for each channel give us a way to customize the system for new sensors. All of the sensors in the WANDER prototype were connected via serial ports, but future ones may require the use of PC/104 hardware.
The periodic sampling itself is provided by the Perl Event module. A given sensor may be notified upon a timer event, an I/O event or both.
We provide several concrete base classes for common sensor configurations, including the WaitingSerialChannel, which waits for data to become available and uses a regular expression to extract values from serial devices, and the PollingSerialChannel that wakes up periodically, reads any available bytes from the serial port and uses a regular expression to extract values.
Adding a new serial port-based sensor can be as easy as specifying which port to use, the data rate and providing a regular expression for parsing its data. Parentheses in the regular expression delimit the data that gets stored in the database, but in some cases a single serial port provides data for more than one channel. One example of this is the GPS, which can provide latitude, longitude and altitude information within the same once-per-second NMEA ``sentence''. In such cases, additional sets of parentheses in the regular expression delimit the data for the other channels.
Because the user can add multiple name/value pairs to the channel configuration information from the web interface, custom setup data can be added very easily and made available to the channel driver scripts.
Of course, for all this to be useful, ultimately the collected data must be transmitted to a central location. This is handled in the WANDER prototype by sending the most recently collected data via e-mail when a PPP connection is initiated via the Globalstar satellite phone. An ifup script (invoked after the PPP connection is initiated) invokes a Perl script that queries the database for samples collected after the last e-mail, formats them into a text file and sends them to an SMTP server.
A future improvement would be to delete already-sent data after an e-mail acknowledgement. However, since most of the 4.5GB hard drive is unused, all the data for a typical experiment can be stored on disk if necessary.
For data-collection setup in the field, WANDER allows local viewing of collected data via its Webmin web server. The user selects a time range and channels of interest, and then views or downloads the collected data as graphs of values vs. time, several channels overlaid on a single graph or as separate graphs. Naturally, the data also can be viewed or downloaded in spreadsheet-compatible CSV form.
The common user system administration tasks and data-collection setup are managed by a web interface over the LAN connection. This web interface is supplied by a web server and suite of CGI programs that come as part of the Webmin package. All the system configuration that WANDER might require, from network setup to software package management, is handled by one of the Webmin modules. Webmin's web server also serves reference and configuration help documents.
We added our own Webmin module for the WANDER-specific tasks of data-collection configuration and control, and for viewing or exporting the collected data. Perl was again the natural choice for writing this Webmin module because Webmin itself is written in Perl and includes a support library for module use.
Because the WANDER system depends on a rechargeable battery, we had to find a way to shut down the system cleanly before the battery got discharged too far--Linux doesn't take kindly to brownouts.
After discarding a couple of inadequate off-the-shelf solutions, we designed and built a solar battery charger and power monitor board using a Microchip PIC microcontroller to monitor battery and solar panel voltages. It also monitors case temperature because the charging voltages of a lead-acid battery are temperature-dependent.
The charger does the best it can to keep the system powered and the battery properly managed (which is primarily about avoiding the twin evils of overcharging or deep-discharging the sealed lead-acid battery).
This board is connected to the Octagon board using both a serial port and a single digital status bit, an output from the charger board that warns of impending shutdown. It has a second digital output that connects to the DC/DC converter's remote ON/OFF input, so it can shut down the power supply to the Octagon board, LCD and hard drive.
Normally, the serial port is owned and used by the data-collection task to read the temperature inside the case while monitoring the voltages of the battery, solar panel and the external analog input. When the battery voltage gets too low, the power manager toggles the status bit (connected to one of the auxiliary digital I/O lines of the Octagon board), and a dæmon detects the change and tells the system to start a graceful shutdown.
This simple ``power handshaking'' scheme offered a capability that was just too tempting to resist: it's possible, during shutdown, for the Linux board to instruct the charger to wake it back up in a certain amount of time. This can be used when sampling intervals are far enough apart to make it worthwhile to turn the computer off between samples, particularly useful in a scarce-power environment.
If a timed startup is not chosen, the system automatically will be restarted when the battery voltage gets high enough to stay alive for a while. The voltage thresholds defining this hysteresis loop can be changed using the serial port and are stored in EEPROM on the board.
One of our major concerns in the WANDER design was power consumption. Using the APM kernel module, we were able to slow the CPU during times when the system was not actively processing. We didn't see any reason to use the apmd dæmon. In addition, the noflushd dæmon shuts down the hard drive motor after a period of inactivity and waits for a disk read before it starts the drive motors again.
The APM shutdown function doesn't work because the system power supply is a custom job, and the BIOS has no idea how to shut it off. To turn off the power supply, we must send a message to the power monitor board via its serial port.
In normal operation, of course, there isn't a computer attached to the LAN. The field user is likely to be more concerned with attaching the sensors and solar panel to the external connectors and starting data collection. For such everyday tasks, we added a small serial-interfaced LCD panel, keypad and ON/OFF switch to the front panel--doing serious configuration or data analysis requires an external laptop (WANDER has a static IP address but easily could run a DHCP server--we left this out to facilitate connection into existing LANs).
The Matrix Orbital 4 × 20 character LCD monitor and Grayhill 20-keypad are handled by a separate Perl dæmon process. This can turn sampling on and off, monitor the latest values from the channels being sampled, display network activity or power subsystem status, or shut the system down. The ON/OFF switch is only a sense input and is monitored by the power-control/battery-charger board. When the user turns off the power switch, the battery-charger board warns the Octagon board of impending shutdown as if a brownout were imminent, and then waits a minute for Linux to shut down gracefully. Then it shuts off the 5V power supply to the system and awaits the command to turn back on.
We were pleased to observe a typical battery life of 16-18 hours in normal operation and an overall system power budget that could be supported indefinitely around the clock in moderately sunny conditions with a 50-watt solar panel. Still, this is hardly the kind of thing one would deploy in an unattended remote-sensing application; we see it more as a tool for human-mediated environmental research as well as a development system for ultra low-power standalone monitoring tools.
The WANDER code base should port handily into a StrongARM (or similar) embedded Linux board running in CompactFlash, allowing the deployment of cheap, smart, low-power data-collection systems that play nicely with standard network protocols. This is one of the major shortcomings of most commercial products that purport to serve the same purpose: they have the analog front-end and data-collection components well refined but tend to require dedicated PC client software to disgorge their contents reluctantly. WANDER, on the other hand, appears as just another web server or scriptable data source that talks standard FTP or e-mail protocols--even from the boonies.
Steven K. Roberts (email@example.com) is perhaps best known as the guy who wandered 17,000 miles around the US on a computer-laden recumbent bicycle during the 1980s. Since then, he has been taking entirely too long to build the bike's successor, a networked amphibian pedal/solar/sail micro-trimaran known as the Microship. Ned Konz (firstname.lastname@example.org) was writing robotics code in Smalltalk for semiconductor factory tools but then escaped on his recumbent bicycle. He entertains himself by designing microcontroller systems and programming in Squeak Smalltalk, Perl and Ruby, and was the lead WANDER software designer. He is also available for consulting work.