Automating the Physical World with Linux, Part 3: Designing around System Failure

Bryce examines some of the causes of system failure and gives some tips on how to avoid it.
Using Simulation to Avoid Design Flaws

The simple sprinkler control system introduced in the first article would not benefit from a simulator. Since my embedded controller never inspects what the sprinklers are doing (there is no feedback, which is the result of an open-loop system), it will never be able to detect a failure. I can change this to make using a simulator worthwhile. If I add a device that senses when water is flowing in the pipe, the controller can detect whether water is flowing when it shouldn't be (for example, when all the valves are closed).

In the case of my sprinkler system, I could simulate a flow sensor by adding a simple on/off switch. When the computer opens the water valve, I manually turn on the switch. The switch simulates a flow sensor sending a signal to the controller, meaning that water is flowing through the pipe. When the computer closes the valve, I'll leave the switch on to create an anomalous condition: the valves are closed, but the flow sensor still detects water flowing through the pipe. The controller should now take some action to respond to this condition.

Alarms

The best way to describe what the controller should do when a system anomaly occurs is ``yell, scream or blink''. A system anomaly alarm is a way for a control system to indicate to the user that something is wrong. If I connect a buzzer or horn, I program the system to turn on the audible alarm. If there's a display connected to the system, I have it turn red and flash ``FAILURE''. If there's a pager or e-mail system, alert messages are sent to people all over the world. In short, there are numerous ways to notify users that a control system has failed. It's important to ensure that the alarm action is appropriate to the situation. Don't use a low-key blinking light for a sprinkler valve that's stuck open and is flooding the golf course. On the other hand, don't send an electric shock to someone's chair; the system's users may not appreciate it.

Controllers as Simulators

Simulation is a unique science. Simulators are control systems that use mathematical or logical models to reproduce a physical system's functions. They can also test whether a control system reacts and functions properly, using scenario testing that reproduces the signals a control system receives from sensors and other devices.

There's nothing really unusual in implementing simulators. I think of a simulator of a physical system as a control system, but backward. For example, a simulator could use an embedded controller to connect an output to each of a control system's corresponding inputs. The simulation system would then send signals that match how the physical system would react and monitor how the controller tries to correct them.

Useful validation tests may be performed once a simulation system is coupled with the control system. In the case of a reciprocating engine test, for example, the simulation system can test what the control system does if the oil pressure fails or the engine temperature goes too high. This test will validate whether the criteria for engine protection operates properly. Creating complex simulation scenarios may exercise exception-handling algorithms more rigorously than would ever occur in the real control application, but ultimately this is very beneficial.

Simulators for Training and System Improvement

The word simulator probably makes most people think of flight trainers. This may demonstrate the simulator's most important role: training. Like training a pilot to fly jet aircraft, training personnel to operate a complex new control system is expensive, tedious, yet extremely important. I wouldn't sleep well at night if I knew that new employees at the nuclear power plant down the street got hands-on training on the actual reactor. This is an exaggerated example, of course, but control-system training is a serious issue.

Both those who operate and maintain a control system may become part of the simulator-scenario testing. This type of training allows the staff to become comfortable with the system and learn how to react appropriately if a system failure occurs. These tests also offer another way to improve the system's design, refine operational practices such as maintenance schedules, and implement other functional improvements that make the system more useful and also separate an average system from an excellent system.

I really can't emphasize enough the importance of this type of simulation in control-system design. This simulation is the best opportunity for the developers, designers and users/customers to work together to develop a better system. It's also the best time to make mistakes (whether accidental or deliberate) and learn from them. While mistakes on the real system can't be reversed, mistakes on a simulated control system are just like a video game: just press the reset button.

Cost is the single largest obstacle of simulation. Using a simulation system adds a significant amount of labor and material to a project. In fact, creating the simulation system is equivalent to adding another control system. The simulation system, however, allows the control system to be tested and improved without affecting the real system. Dedicating a duplicate control system with a simulation system offers the benefit of performing many new scenario evaluations, concurrent software improvements for the real control system and continual validation.

There is also a long-term financial gain to using a simulation system. In a production facility, such as an automated assembly line, any system downtime is very expensive. Installing software upgrades often requires a system to be completely shut down, and in a relatively untested control software upgrade, there's usually a very high risk that the new software is unstable. The simulation system offers not just the ability to test the new software, but to determine the time needed to upgrade the current control software to the new version. I'm certain that these reduced downtimes, coupled with a higher confidence in software operation, more than pay back the investment in the simulation system.

To me, simulations offer peace of mind by providing the ability to simulate and test any control function that you have doubts about. Testing complex systems is very difficult, and testing a complex system on the ``real'' machine is impossibly cost-prohibitive, time-consuming and always carries the chance that damage to the physical system may result. In simulation, you have the reset button, plus the time to look back and study the phenomena that caused the failure.

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix