Linux in an Embedded Communications Gateway
In 1996, Pacific Gas & Electric Company (PG&E) began a project to develop a commercial-grade advanced sensor to monitor electric lines. The system was simple in concept: use lots of small cheap units that hang on the electric lines, each consisting of a few sensor elements, an embedded CPU, a spread-spectrum radio and a battery. The sensor units would monitor for certain line conditions; if those conditions occurred, the sensor would conduct measurements, and then radio the information to a “master” radio unit. That master unit would, in turn, forward the information through a gateway to a server on the corporate WAN.
Server software would analyze the arriving data and, if warning conditions existed, would alert the electrical system operators. Other software on the WAN would be capable of monitoring and performing remote configuration of the wireless sensor network. This software would use a proprietary wireless networking protocol to accomplish the control and monitoring—in effect, tunneling this protocol within TCP/IP out through the gateway system and back.
The project was on an aggressive timetable: to put a working demonstration in the field before winter storms began in the fall of 1996. Luckily, the underlying sensor technology had already been developed by the PG&E Research and Development Lab. PG&E partnered with an outside vendor to develop, test and bring to market a wireless network of sensors based on this sensor technology. The total system was to consist of three major parts: the sensor network, the client software to use the data collected by the sensors and a gateway providing a reliable link. Linux was chosen as the operating system for the gateway.
The goal of the project was to develop and test sensor technology and its associated wireless network. R&D had pioneered the actual sensor design and had conducted field tests of prototype sensors. To move the technology from prototype to product, we had to design for manufacturing—not something with which a utility company has experience. In addition, developing and testing a wireless spread-spectrum radio network was beyond our abilities. Therefore, we partnered with a commercial vendor to help develop and produce the system.
The vendor was to integrate our sensor designs with their radio and to implement the radio network. We would provide full technical assistance and handle all data collection and analysis for the initial trials. A fairly substantial software development effort would also be necessary. The client-side display software was based on other, related software development efforts, so that part of the project had a head start. I was a member of the project development team in R&D and was responsible for the actual development of the gateway software.
The wireless sensor network used spread-spectrum radios. These unlicensed radios are relatively immune to interference, but require a clear line of sight between transmitter and receiver. The sensors were designed to hang from power lines. To provide a line of sight to those sensors and good geographical coverage from a single master radio, the radio site must be above the level of the power lines. Thus, the master radio must be on a hilltop or mounted on a tower to be effective. Either solution poses problems in getting power and a WAN connection to interface with the master radio. To add to the challenge, a given site might require more than one master radio, depending on geography. To “look” over a knoll or around a building, we might have to add a second master radio. Even in those circumstances, we wanted to use only one gateway system; therefore, each gateway must be able to serve several master radios.
The field trial site for the project was chosen for its interesting electrical distribution circuits, topography typical of our electrical service area and a relatively accessible site for a master radio. The site was several hours away from our office location at the top of a large, steep hill. The road up the hill was treacherous even when conditions were dry; in wet winter months, reaching our equipment location at the top of the hill would be very difficult. Given the difficulties of physically reaching the gateway system, it was imperative that it be reliable and fault-tolerant. The site already had a backup electrical generator protecting other radio equipment located on the hilltop, which significantly simplified our power requirements for the gateway system and master radio. A small UPS added ride-through capability to protect the systems during voltage sags, and while waiting for the backup generator to come on-line in the event of an outage.
The master radio was custom-developed by the wireless vendor, but its link to the gateway was limited to a single serial communications port. No embedded Ethernet or other networking interface would be included in the radio. The gateway system had all networking responsibility. To further complicate the matter, radio antenna requirements forced the master radio to be located outside, high up on a mast. The computer system was to be located inside a nearby building. The serial communications cable between the master radio and the computer system would have to extend about 50 feet. This eliminated standard RS-232 communications, since at those cable lengths we would never attain the design goal of 56 baud. Also, normal RS-232 communications would not allow us to have multiple master radios per gateway without adding more serial ports. To solve the problem, we chose to use RS-485 two-wire differential communications, which gave us distance, high baud rates, excellent noise immunity and the ability to expand the link to multiple drops (supporting multiple radios) if needed.
The gateway would require additional functionality for the field trials. To properly evaluate the data gathered from the sensors, data had to be recorded somewhere. The system specifications called for data to be delivered in real time to a central server that would record it, perform data analysis and, if necessary, send an alarm to the electrical system operators. Any failure of the links from the gateway to the WAN or a WAN outage would break the entire process and risk data loss. In fact, the times when the most interesting data is collected—during storms—is the very time links are most likely to fail. Obviously, the gateway must have redundant links and the ability to log sensor data locally, should all links fail. The hilltop had a direct line of sight to a local PG&E office and a few spare telephone wire pairs. For the primary link, we used a commercial spread-spectrum, point-to-point radio bridge. (Similar radios were already in wide use in the company as WAN links.)
Our backup link was ISDN. We chose to use a commercial bridge to provide automatic switching between the radio and the ISDN links. While we could have added that capability in our software, we chose to buy it “off-the-shelf”, believing the extra cost was worth the reduction in development time. To provide additional redundancy, we added a standard dial-up line. This would provide an alternate path for retrieving data should a WAN outage occur, severing the link on the network side. As a complete backup, the gateway would also log all network traffic from the master radio. Those logs would be kept on the system's hard drive, since we expected to repair any link failure before recording many megabytes of data. Of course, we still might have to physically go to the gateway and perform a manual download, but we wouldn't lose data.
As discussed above, the master radio communicated to the gateway across an RS-485 link. This link used a simple protocol (similar to HDLC, high-level data link control) to packetize the proprietary radio network protocol and allow for multiple master radios when required. The gateway would take the data stream from the RS-485 link, encapsulate it into a TCP/IP socket connection and flow it onto the WAN. Data flowing in the other direction would go back out the RS-485 link to the master radio. On the WAN, a simple database server would be at the other end point of the protocol tunnel. The software used by the electrical system operators would interact only with the server. This modularization was key to dividing the work among the disparate groups participating in the development process.
While designing the system, we always had an eye on scalability. Each WAN-based server needed to be capable of supporting multiple wireless network gateways. Successful completion of the project would mean rolling out several hundred wireless networks, leading to large database requirements. In addition, the server software would need to run on hardware not specifically purchased for this project. All PG&E divisions had large Sun servers in place for support of other engineering functions. From both a financial and a space perspective, the idea of buying and installing new servers just for the wireless sensor network was rejected. Our new server software had to be portable across major UNIX platforms—at a minimum, SunOS and Solaris. Recall that the server was to be an end point for the protocol tunnel carrying the proprietary radio network data. The code we wrote for the server had to be portable to the gateway as well; writing two versions was not only silly, but also out of the question given our short development timetable.
All in all, the bridging system and its associated components had quite a list of requirements, including the following:
Speak RS-485 on one side and Ethernet on the other.
Support tunneling the radio network protocol across a serial communications channel and within TCP/IP.
Support a primary, a backup and a dial-up link.
Support automatic data logging locally if those links failed.
Support the ability to extract logs over the network without disrupting the gateway operations when links were restored and without disrupting normal operations.
Be as reliable as possible to operate in a remote, occasionally inaccessible location.
The server software must be portable across various flavors of UNIX and reuse as much of the gateway code as possible. To make matters more exciting, the entire system had to be built, tested and installed in the field in about six months.
For much of the system, we chose to use off-the-shelf technology in order to minimize development time and maximize reliability. However, to our knowledge, this type of project had never been done before, and most of the special functionality would have to be written by us. Without a stable operating system with a reasonable set of features and tools, we'd never be able to meet the deadline.
Linux was immediately considered as an option for the project, due to positive experiences on an earlier project using Linux. However, the earlier project did data translations only in a batch environment in a normal office setup. While that earlier software was a major project, it was very different from a mission-essential “embedded” system. The gateway would be remote and unattended with no recourse to human intervention if something went wrong. We were certain Linux could handle the load, but prudence demanded the evaluation of alternatives. The gateway was essential to the entire project: without it, we'd have no performance metrics on how well the radio network of sensors performed. We needed solid reasons to justify our choice of Linux for the project.
As always, there were strong proponents for using a Microsoft OS. We rejected that approach immediately. Our experience with Windows NT indicated it still had some stability problems (despite being far more stable than other Windows platforms). While it is perhaps a useful desktop and/or server OS, we felt that NT was unsuitable for remote systems, particularly those without a display or a keyboard. We were also unhappy with the remote administration capabilities of NT. Perhaps most importantly, however, the system was to operate in an environment where it had to boot and run its software automatically if the watchdog timer ever reset the system. We had little faith in NT's ability to boot, correct disk problems and then run properly in this scenario. NT (and all other Microsoft Windows operating systems) were rejected for these reasons.
Various DOS-based solutions were also considered. We had serial port-handling libraries for DOS which we had used successfully in the past for solving the serial communications problems. Micro-Controller Operating System (muC/OS), a freeware real-time OS available for many different microprocessors, was evaluated as well. muC/OS had similarly good serial port-handling functions. Both were judged to be stable and reliable for an embedded application. However, one of the core system requirements was to allow log and data file extraction without disturbing the gateway operation. DOS would require some tricky programming to allow that option due to the uncertainty of some system calls, and we felt muC/OS would have trouble with it under high traffic loads. Portability was another issue—the protocol tunnel code had to be easily portable to UNIX. Even if we could have located an affordable, reliable, TCP/IP stack for either DOS or muC/OS, we had serious doubts about our ability to keep the source code portable. We removed both of those OS choices from the list.
We had a lot of choices among UNIX-like operating systems: QNX, Linux, FreeBSD, Solaris x86 or SCO. Our exposure to the latter two left us with serious concerns about their performance. None of the systems we had seen running those operating systems were particularly fast or responsive, even under fairly light loads. We also expected to port the gateway computer hardware platform to a single-board computer (SBC), and we wondered if either of those operating systems could run successfully on such hardware. We were certainly doubtful about the availability of RS-485 and watchdog timer drivers for those systems. We saw little value in choosing FreeBSD over Linux, since all our experience was with Linux.
So, our choices were reduced to QNX and Linux. QNX is a stable, multi-tasking, scalable, reliable, real-time OS that can easily run on SBCs. It has serial port/RS-485 and watchdog timer support and full networking support, including TCP/IP. QNX also enjoys a reputation for good technical support. In short, a good initial fit was found between QNX and the project's needs. However, there were also pitfalls. QNX did not support gcc (we heard a port was in progress). The only compiler available was the one provided by QNX, based on the Watcom C integrated development environment. While it's probably a good compiler, it meant learning a new tool, and that translated to time lost in an already short development cycle. We were already using and familiar with gcc and the other fine GNU tools. Given the short project “time to market”, any time lost from learning the new environment was considered detrimental.
That same philosophy extended to QNX in general. As a sophisticated real-time, micro-kernel OS, a learning curve is involved to use it well. We asked ourselves if we actually needed a real-time OS with the extra complexity that would come with it. We also wondered about portability issues. Having no experience with QNX, we were quite concerned that we would end up having to write the code twice: once for QNX and once for our server platform. At the very least, we were afraid we'd have to spend extra time adding conditional compiler directives to make the code work in both environments. In contrast, using Linux practically assured identical code. The licensing costs of QNX were substantial as well, in particular, when the full TCP/IP capabilities on every network node were added. In contrast, a single $50 Red Hat Linux CD-ROM represented a significant cost savings.
In the end, low cost, familiarity and availability of GNU tools tipped the scales in favor of Linux. It met all the project's key requirements at a tangible cost benefit which was hard to ignore. In summary, Linux provided:
A stable, robust, multi-user OS
Solid support for TCP/IP networking and serial communications
Ability to run on various single-board and host computers
Familiar, commonly used development tools with familiar system libraries—no time lost learning other tools and systems
Code that was completely reusable on other UNIX platforms to implement the servers for the system
As an added advantage, if we ever went into production and began building hundreds or thousands of gateways, Linux would save us a significant amount of money because there are no license fees to use it.
The gateway system was in place for more than a year with no significant downtime and no loss of sensor data. At one point, the system had 179 days of uptime—and the only reason it needed to be rebooted was as a result of troubleshooting a problem with the network bridge equipment. Linux has proved remarkably stable and effective, and human intervention has not been needed.
The system we installed in the field is a bit different than we originally envisioned. We had some trouble with the single-board computer we chose, but those problems were related to the on-board RS-485 hardware, not Linux. It looked grim for a few days as we fought communications problems and pored over the serial-driver source code to try to find a fix. Another major plus Linux has over other OS choices is the full availability of the source code. We had not expected to need it, but it certainly proved valuable. The developer of much of the Linux serial driver code, Theodore Ts'o, personally answered questions by e-mail—within hours in one case. While we certainly did not expect that response every time, we've never gotten that level of support from any commercial vendor.
While resolving the hardware problems on the single-board computer, we shifted development work to an old 386-40MHz motherboard we had lying around. We swapped in the hard drive, added a network card, a watchdog timer card, and an RS-485 port card and immediately had a working gateway computer. The single-board computer was a 486-100, but we discovered Linux to be so efficient with minimal hardware, a faster CPU was not needed. We never did resolve the strange hardware problem on the single-board computer. The 386 system worked so well that we stayed with it and actually used it for the field trials. It worked perfectly. The incredible range of hardware that Linux supports became another solid plus that helped us implement a working system.
The server code also ran reliably. We tested and demonstrated the software on Sun servers running both SunOS and Solaris; however, the workhorse for the field system was a Pentium PC running Linux. The code base for the two systems was identical—the only difference was a few lines in the project Makefile. Our objective of portability was made trivial by Linux's adherence to open standards.
The wireless sensor network was a moderate success, although some technical problems did arise. However, during the middle of the project, changes in California law introduced more competitiveness into the state's electric utility industry. The sensor project had been started by PG&E's Department of Research and Development. One immediate change in the state's laws was that R&D funds would be administered differently. As a result, the company abolished its R&D department and transferred the project to another department to finish. The initial phase of the project is now wrapping up, but what steps will be taken next is still unknown.
Linux has matured since this project began, and it's worth taking the time to examine how recent Linux developments might benefit the next stage of gateway design. I would like to emphasize that the following comments are hypothetical—if I were to attempt upgrading and modernizing the gateway system, this is the approach I would take.
The next stage would require deploying multiple radio networks and gateways to serve large geographical areas, and changes could be made to simplify and improve operations. Gateways in non-testing situations would not require as much logging of radio network traffic, so the hard drive, the only moving part in the system, could be removed.
Paul Moody has written an excellent new Mini-HOWTO on embedded Linux (http://users.bigpond.com/paulmoody/). His document makes adding a flash disk as simple as following a recipe.
For our field trial, we specifically chose to use external, discrete network bridging equipment to link to the corporate WAN. We felt it would be fast to implement and more reliable than something we wrote ourselves. It wasn't quite plug-and-play (configuring the bridge proved troublesome), but it was relatively quick to set up. However, the bridging equipment was the only source of problems the gateway experienced in nearly eighteen months of service, so I'm no longer convinced of the superiority of proprietary network equipment. In contrast, the custom gateway software performed flawlessly.
The June 1998 issue of Linux Journal had an excellent article on the use of the Sangoma WANPIPE S508 router card (http://www.sangoma.com/). In addition, Usenet reports from users include raves about the Spellcaster ISDN cards (http://www.spellcast.com/). Using those cards in a future system would integrate all the gateway functionality into one box (with the exception of a radio for a point-to-point link, if such a link was needed). The resulting system would be about half the size, approximately two-thirds cheaper, use half the power, and in all likelihood, be even more reliable than the present system.
This project is one of many examples of Linux being successfully used in the commercial marketplace and in mission-critical commercial communication applications. In fact, Linux may have a place in such applications where perhaps no other OS can compete.
Linux is exceptionally well-suited for communications functions. It's fast, stable, reliable, has built-in TCP/IP networking, and it uses common, well-known development tools. Commercial versions of those tools are now available from Cygnus Solutions (http://www.cygnus.com/). Using loadable modules makes trimming the system down to a minimum kernel fairly easy to do, and the whole operating system is Open Source, so there's no limit to the customization possible.
Linux runs on many different processors, opening a wide range of target platforms—yet applications developed for it can easily be ported to other UNIX platforms, as required. The number of Linux ports to small, cheap processors is growing—Linux even runs on the Palm Pilot. Commercial technical support for Linux is available from several companies. Linux developers are getting easier to find as the popularity of the OS increases, and as more universities use Linux in their computer science labs. In addition, and perhaps most important to a commercial venture, is the lack of any license fee to use Linux. That translates into a significant cost savings for commercial products. With its technical and financial advantages, Linux may be the best option for many embedded communications systems.
The coming years will see a huge array of new commercial communications services being offered. PCS systems, new applications for wireless networks, low-earth orbit satellite networks, new utility SCADA systems, home and small office networks—all of these systems will need to be interconnected in various ways, and embedded communications systems will be built to serve these functions. These systems will range from cable-modem/video-on-demand controllers mounted high on telephone poles to small black boxes locked away in phone company central offices to small black boxes that sit behind your PC. I won't be surprised if many of these next generation systems run Linux.