Improving Network-Based Industrial Automation Data Protocols
The industrial automation sector is rapidly advancing into the use of TCP/IP over Ethernet as a replacement for traditional data connectivity. Many of these devices implement application protocols that mirror their older cousins. With the implementation of IP, the operation, flexibility and reliability of these devices may be jeopardized due to oversights in the implementation of sockets as a new connection medium. In this article I will discuss many issues I've stumbled over while dealing with these issues. I'll also present solutions that future data protocols may improve on.
The industrial automation sector started with direct-connected I/O devices. These local devices are connected directly to a peripheral bus of the CPU, such as an 8155 parallel I/O chip or even a memory mapped latch. Many large scale or expansive industrial systems, including petroleum facilities, water treatment plants and building environmental controls, require large number of monitoring and control points. Because of the electrical limitations of the CPU bus, remote I/O interfaces were devices that allowed a data protocol to address multiple I/O units. Serial interfaces (like RS-485) were used as the bus.
Serial interfaces are private and limited access interfaces. The simple serial bus does not provide the capability for transmitters to detect or resolve who owns the bus during a transmission. Hence, a single master transmitter originates all command messages, and the other slaves simply reply on requests to it. Other limitations of the serial bus included a limited length (several thousand feet or so) and a very limited bandwidth shared among many slave devices (effectively lowering the per slave bandwidth). These serial devices no longer had a memory-mapped architecture and relied on a data protocol that read and wrote information to them. The data protocol served as an abstraction of the memory map of the direct connected devices.
Due to the restrictions of bandwidth, some manufacturers created proprietary bus interfaces and others used nonmainstream, simple network interfaces (such as ARCnet). These interfaces allowed increased bandwidth, multiple master capabilities and practical expansiveness. But there was one simple problem: these interfaces and networks never entered the mainstream marketplace. As illustration, say a single manufacturer may support only its own I/O interface; the risk of the company terminating the product line or a device reaching its end-of-life doomed the future of the system. Unique cabling and networking systems had to be created just to support this hardware, which was expensive and required additional maintenance. Due to the small support marketplace, the amount of resources available to improve this technology was too small to carry it forward.
Then the automation sector started to create devices that used Ethernet connections. Ethernet provides transmitter resolution and has sufficient bandwidth and excellent expansion potential. Ethernet also is a mainstream interface, its the biggest strength. Some early implementations used raw Ethernet packets. Unfortunately, raw Ethernet isn't routable, at least not in the way IP can be. Therefore, all the devices in this raw Ethernet network must be of a common segment. Nonetheless, this is an effective method. Using Ethernet means an automation integrator could use an existing network infrastructure, cutting costs by not implementing another network and using an existing resource.
As you would expect, the industrial automation sector started implementing TCP/IP over Ethernet. This had all the advantages of raw Ethernet but with improved routing and nearly infinite expandability. The Ethernet market has huge amounts of funding and large numbers of competitors, yielding very high throughput yet affordable systems. As a result, network determinism is a manageable issue and not just the fear of a network designer. Today, nearly every computer has a TCP/IP stack. Clearly, neither TCP/IP nor Ethernet are one company's standards but perhaps one of the few real international standards.
However, there is a fundamental issue: most of the data protocols are borrowed or inherited from their older counterparts. Certainly, if an automation integrator uses a company's network, it's not private anymore. In addition, TCP/IP, UDP/IP and Ethernet are not quite like a simple serial data packet. This transition has created new problems and opened the door to new strengths.
Here's a short refresher on the essence of the traditional communication protocol. This method is typically found in serial and several proprietary data interfaces.
The traditional serial data interface is connectionless; the polling master computer initiates all conversation. It sends a command that either writes or requests data from a slave and the slave responds. It assumes that the device exists by virtue of a response that arrives after a matching command.
No synchronization exists between the master and slave either. If the master gives up too early and begins transmitting a command, it may collide with a slave replying from the last command. The application managed the retransmission if a slave didn't respond or the slave responded with a corrupt reply. The point is the slave never responds unless it's requested to, it's passive. I've referred to this kind of protocol as the Marco-Polo (a childhood water game) technique.
The entire data interface supports a single data protocol. This is required because the slaves are listening for protocol-specific information. Other protocols transmitting on the bus may confuse the slaves.
Typically, these systems are repetitively polled. The polling rate may be hundreds of times a second or hundreds of seconds between polls. The polling rate depends on the response requirements of the control system.
With the traditional bus of a limited number of devices, an application could scan every possible address combination to see which addresses respond. This isn't practical with IP. If a IP host were configured on a Class A subnet, there may be 24 million possible slaves to poll.
A Simple method exists to poll for these automation devices. Say a UDP port is reserved for a service locator. When this service locator receives a specific message, it will reply with its device type. An IP host could then perform a UDP broadcast, and every slave could reply back to the broadcaster (one transmission, and all available automation devices report back to the transmitter).
To sidetrack for a moment, if you hear someone say they were to Telnet port 80, you'd immediately think HTTP. In fact, very few reserved ports exist for industrial automation, not that it's important. What is important is if I connect to TCP port 12345, it better be what I think it is. But, if I Telnet port 80 and I didn't know what port 80 was, how would I know (the blinking cursor doesn't help much either)?
If we were to add the ability for the service locator to describe which ports are available for which data protocol, this would mean that my host computer could first broadcast to see which devices are available and then determine what services are configured where. This means that my industrial automation data protocol would be located on any port, even the IP protocol. This way, TCP port 12345 is significant before I try and connect to it.
In industrial automation systems, maintaining a unit may require powering it down. The host sees this device's unavailability by the lack of a reply to a command. But would the host know when the device was turned back on? It has always been difficult to test if a device was available at a specific address. An application would have to attempt communicating with the device just to see if it was awake. This is a laborious and inefficient method to rediscover the device.
Much like a RARP, what if the unit broadcasted its awake state (when it was turned on)? The host system could receive this message and take action to commence communication with it. This would prevent a host from wasting its time polling a device that doesn't exist. I also believe most software engineers would say this method might be easier to implement, as the retransmission logic is fairly difficult and rather application-specific.
Of course, the skeptics are going to point out, what if this wake-up packet got dropped? It does happen, but the wake up could be set to always broadcast at a conservative time interval. Sure, one packet could get dropped, but would the next one? If it does, something might be wrong with the network. Certainly, a heartbeat couldn't hurt either; if the heartbeat rate is low enough, its extra traffic would be comparable to the ordinary broadcasts of the network.
There's another strength in this feature of systems that slowly poll the hardware, perhaps even waiting minutes. The traditional method is to transmit the command and wait for the response (if the device isn't available, no response will arrive). Wouldn't it be better if we knew the target automation device recently sent a heartbeat and that our next transmission will probably succeed? If we haven't heard a heartbeat from the specific device in a long time, why try and transmit to it?
Also, if the device's heartbeat had a beat count, a monitoring application could tell when the device was reset, when a packet was dropped, when power was cycled, when network infrastructure went down, when a gateway or telecom link failed or when some other strange anomaly occurred. These statistics may assist maintenance and diagnosing hardware or network problems.
Let's return to the Telnet to port 80 example. TCP allows a backlog of connections to accumulate. When you've successfully Telnet to port 80, how do you know whether the socket is still in the backlog queue or is actively being polled by the scanner?
This problem also has its roots in the traditional serial interface. The assumption is the slave is hard-wire connected. There is no connection backlog; what the host sends is received by the slave. The traditional method is to assume that once the communication resource is opened, it can simply send when it wants to.
What if the automation device's command processor can send an acceptance message to the host? This message could tell the host that this socket isn't just accepted but also actively being examined. The host will know that it can now send commands, and the automation device will immediately process it. Without this, the host could send a message down the socket before the command processor has accepted the socket.
TCP and UDP are very dissimilar data protocols. To review, TCP is connection-oriented, stream-oriented, with guaranteed delivery and guaranteed order. UDP carries unreliable packets of a limited length, it's connectionless, packet-oriented and arrival order is not guaranteed. Routers may be configured to drop UDP packets to unknown ports (fairly unlikely these days).
The traditional serial data bus is more like UDP in its lack of delivery guarantee and its connectionless. The arrival order, however, is something to cause concern. Also, the packet orientation of the received data may cause excess data to be dropped if the receiving buffer isn't large enough.
TCP is similar to the traditional bus in that data arrives in guaranteed order and appears in a stream-oriented fashion. TCP differs in that each socket is a private virtual connection to each automation device, and guaranteed delivery means the IP stack will retransmit the message if it isn't acknowledged by the receiver.
Between UDP and TCP, other than a shared ability to transmit and receive data, there isn't a lot of similarity between them. This may cause problems when trying to create a data protocol common to both of these IP protocols.
Returning to UDP, let's make an assumption that each UDP command and response stays below a network MTU. This would guarantee that the data is encapsulated in one packet. If two packets arrived at the receiver, the receiver API will not concatenate the two packets together. Rather, the first packet would be received and the second packet would move on to the next receiver. But, should the application request less data than is contained in the packet, the remaining data will be dropped. In the UDP model, it is possible for the receiver to know the beginning and end of a data segment.
Furthermore, due to the lack of sequence preservation, the first packet may have been the second packet sent. In UDP, if the data protocol can have multiple and simultaneous data packets sent and received, an application layer sequence ID is required to allow re-ordering of the data packets.
The connectionless nature of UDP means that if the automation device was power cycled, the polling master may not be able to tell it occurred. This is no different than traditional serial devices.
TCP has a stream model: if two packets arrived and there isn't anything in the data stream to indicate where the packets split, the application won't be able to discern the first packet from the second packet. Some kind of data size is required to allow the receiver to determine the beginning and end of the information in the stream.
TCP is also connection-oriented. Should the automation device be power cycled, the TCP connections are reset. The polling master would then receive a socket error if it attempted to transmit to a device that doesn't have an established connection.
Another consequence of the connection-orientation is the connection must be gracefully closed. This graceful closure requires data packets to be transferred between the polling host and the automation device. Should the polling master close the connection and is some kind of infrastructure failure occurred, the automation device may be left with a permanently open TCP session (a resource leak). If this happens too many times, the automation device may run out of resources.
TCP guarantees arrival, which sounds like a great thing. The dreadnought nature of TCP has one giant Achilles heel--time. TCP generally makes three transmission attempts to receive an acknowledgement from the receiver. The first timeout is short, the next is longer and the last is eternity; most automation scanners won't wait this long. In addition, failed infrastructure, power cycling or a cable removal could cause this problem, which might also tie up resources. Because many automation devices use IP stacks of limited resources, carelessly creating (or leaking) TCP resources may exhaust the automation device's IP resources.
Because traditional serial data streams had high transmission latency, the data packets were kept as small as possible (typically, well under 256 bytes). This small data size only allows a single data write or a data read request to be sent at a time. With significantly larger MTUs, data packets as large a 1,400 bytes could be handled, including a read and a write request simultaneously. This would decrease the number of sends and receives by a factor of two, reducing network traffic and making the polling scanner more efficient.
Also, Ethernet virtually allows parallel data via the collision resolution. However, with switch infrastructure, this parallel model can be exploited further. It's possible that a data read or write request could be broadcasted or multicasted to automation devices; this would allow simultaneous updates with a single send, as well as a parallel read receive with a single read request. This feature isn't possible on devices that don't support transmission resolution.
By security, I'm referring to casual validation that the data sent and received belongs to the data protocol. A header that includes a unique magic key allows a receiver to detect if the message is known. This header may also include a data length to allow TCP packet segregation and possibly even a simple checksum to validate the encapsulated data.
If there are concerns that sensitive automation data is being sniffed, a sockets encryption layer may also be provided. This will add additional overhead to the packet processing. As embedded processors become more powerful, it's likely that this will become a standard feature.
The rapid parsing requirement of many data protocols calls for the use of binary data. Binary data is excellent if the automation device and the host system share common data formats; namely 2s complement integers and IEEE-754 floating-point standards. There is one catch: endian or byte ordering. Bytes are ordered differently on different processor architectures. Endian is described as little or big endian. While a protocol may be created based about an endian order, it would be convenient if not advantageous to supply a method to have the endian order changed or available on another IP port. Clients may be heavily penalized attempting to reorder bytes.
Also, although rare, an architecture may not implement compatible data formats or its development language may not be tolerant or supportive of binary data streams and their conversions.
Automation protocols over IP and Ethernet are making positive headway. Their rapid evolution has created a few shortcomings and minor headaches. Existing data protocols can be improved in time, however, to overcome these issues and increase their reliability and usefulness.
Several elements, namely the Device/Service Locator, also could be implemented industry-wide. Doing so would allow customers common access to these new automation devices in a standard format.
Bryce Nakatani is an engineer at Opto 22, a manufacturer of automation components in Temecula, California. He specializes in real-time controls, software design, analog and digital design, network architecture and instrumentation.