Writing Real-Time Device Drivers for Telecom Switches, Part 1

by Wael Hassan

Real-time systems are expected to never fail, be accurate and deliver the expected functionality. With a downtime of about three seconds a year, they are tough to beat. This article gives an introduction to writing device drivers for real-time systems and explains the terms real-time, device drivers and switching.

There are a number of principles and techniques involved in writing a device driver. Every time designers have to write one, they must re-invent these principles and techniques based on their own experience. In this article, I discuss general system architecture, which is independent of programming language or methodology. Then I present the characteristics of real-time software development, explain where to start and show the steps, as well as describe the organization. I also devote some time to preparation, real-time issues and finishing the task.

Characteristics of Real-Time Development

There is no universally accepted definition of what constitutes a real-time system. Systems commonly referred to as real-time can range from small embedded microcontrollers that drive the operation of a microwave oven to very large systems such as global computer communication networks. A common misconception, even among some computer professionals, is that real-time by necessity implies microsecond response times that impose low-level assembly language programming. While these characteristics fit some real-time systems, they by no means constitute a universal definition.

Figure 1. Data Flow in a Common Real-Time System

Properties Related to Real-Time Systems

1) Timeliness: this is the most important attribute of real-time systems of any kind. The definition of real-time states that the system is required to perform its function ``on time''. On time, not necessarily a specific granular time, can be after, before or between specific events. In other words, the time notion is system-dependent.

Timeliness includes service time, the net time taken to compute a response to a given input--that is, the time needed to compute a decision. Assume that a gas-tank meter needs to turn on a red light when the level of gas goes down to a specified amount. Service time is the time to compare the current level with the threshold, plus the time needed to decide whether to turn on the light.

Latency is the interval between the time of occurrence of an event triggering input and the time at which it starts being serviced. In the case of the gas-tank sensor, it is the delay incurred during the taking of a sensor reading and bundling of data.

The sum of these two values (service time plus latency) represents the overall reaction time for a given input. The reaction time should be less than or equal to the deadline specified for the operation.

Hard real-time systems are in place when missing a single deadline is unacceptable. Examples include rocket propulsion control systems, flight response systems and nuclear reactors.

Soft real-time systems are used when missing a deadline occasionally is acceptable, such as when there is a limit on how long a user should wait to get a dial tone to place a phone call. That limit, if exceeded by few fractions of a second every once in a while, should pose no problem to the general availability of the feature.

2) Dynamic Internal Structure: since a real-time system interacts with a changing environment that varies with time, a corresponding internal system adaptation is required. This solves one of the main complexities of real-time systems. A real-time system should be reconfigurable as it is running. Moreover, it needs to be hot-pluggable. That is, you can insert cards, remove cards and download software while the system is running at 100%. Data or traffic loss as a result of a cable cut, a power failure or card damage should be contained within a 50ms time frame. Less than 25ms would be ideal. Another issue that is closely related is resource management. As resources may change dynamically during runtime, a real-time program should not make any assumptions on memory space, addresses or number of interrupts available.

3) Reactiveness: a reactive system is one that is continuously responding to different events whose order and time of occurrence are not always predictable. Manna (see Resources) classifies computer programs into transformational and reactive. Transformational systems start off with some initial data that is transformed by a series of computations into the desired output data. Once the output is produced, the system terminates execution. In contrast, reactive systems generally do not terminate. Instead, they are involved in a continuous interaction with the environment. Good examples of such systems are water monitoring and cooling systems, or train cross-exchange sensor systems. Time-reactive systems can be of three types: nondeterministic, where the system has no control over the relative order or time of occurrence of input events; real-time response, where the system must provide a timely response; and state dependence, where the response of the system to a given input depends on previous inputs and time. For telecom switches, all of the above apply. That is, no assumptions can be made on lateness or cell-error rate. Switches can lower delay periods; however, it is dangerous to make any assumptions.

4) Concurrency: this is a characteristic of the real-time world in which multiple computing sites cooperatively achieve some common function. A thread of control that is a logical sequence of primitive operations is referred to as a process. A portion of the system that incorporates a sequential thread of control is often referred to as a process or a task. A concurrent system contains two or more simultaneous threads of control that dynamically depend on one another in order to fulfill their individual objectives. Threads or processes access the same resources, thus they have conflict of interest and often run into race conditions. One should help reduce the number of possible interactions that may lead to deadlocks. Deadlocks on switches are disastrous. A race condition or infinite recursion loop can cause the control processor on the board to reboot, making the whole box unavailable. The two primitive forms of interactions between threads are synchronization, which involves adjusting the timing of execution of a thread based on knowledge of states of other threads (this is needed to insure non-interference) and priority scheduling, which is another way of setting the interaction sequence for different tasks. The processor might have ten or more tasks scheduled. When two processes need to wake up, the system allows the one with higher priority to kick in first.

It is often necessary to pass information from one thread to another. This can take many different forms, including global shared memory, message passing, remote procedure calls and rendezvous. In our example, the desired pressure is passed to the air supply thread as part of the message.

5) Distribution: this is either a given system property or is there to increase throughput, availability or functionality. This is the case when writing software for device drivers that depends on values or information in another switch that can be physically attached or connected via fibers. The same issues that we see in any distributed system can happen here. These are concurrency, unreliable communication media, variable message delay and independent failures.

What Is a Device Driver?

A device driver is an efficient piece of software that abstracts the device, peripheral or chip-dependent information from the higher level software applications. Writing device drivers for dedicated hardware differs from writing drivers for a multipurpose workstation. Obviously, things like the number of interrupts, memory addressing, byte order and ISRs (interrupt service requests) are different.

Device-Driver Organization

The organization of a device driver is dependent on the following factors: functionality, method of hardware access, initialization, device type and interrupt mechanism. For our discussion, we are going to use one of the drivers I have written. This driver takes care of APS.

Figure 2. Device-Driver Organization

Functional Separation

When you think about it, a device driver is really performing two functions: hardware shielding/abstraction and hardware access. Hardware shielding/abstraction provides logical (not physical) interfaces to the hardware. The defines and interfaces are based on the needs of the resources layer. For instance, a resources layer module should make a call like APS_CreateConnect(), not a series of calls like APS_WriteMem(ContextTable), APS_WriteMem(MrxtQueueDepthTable).

Higher layer software shouldn't have to care about this. We'll abbreviate this as HSL (hardware shielding layer) in our sample driver. All of the parameters to these functions should be things understood by the resources layer (e.g., slot, port, IP address) not internal driver concepts (deviceNumber, forwardingIndex). This layer, its function names, parameters and interfaces, should be device-independent and generic enough that the actual hardware device could be changed and the HSL interfaces would not need to be changed.

Hardware access functions to access hardware registers and memories. This requires specific understanding of the hardware (e.g., APS_WriteMem(), APS_ReadReg(), APS_WriteBMRtable()). We'll abbreviate this as HAL in our sample driver. Since the HAL is only accessed by the HSL, it can use special internal concepts like device ID. The HAL interfaces should be device-specific and implementation-independent so that it could be taken out and moved to a completely different product (e.g., taken from an ATM product and used in a narrowband TDM product) without any changes. Good device-driver organization will cleanly separate these two.

If you have multiple devices, you will want two HALs but only one HSL. Imagine that the chipset is composed of the POS and a DS3 ASIC interface.

Figure 3. Multiple Device Driver Organization

In this situation, you might have functions like this:

APS_CreateConnect()
{
  _POS_WriteMCRtable();
  _POS_WriteContextTable();
  _DS3_EnableConnection();
}
File Organization

A traffic-handling driver's file should be organized in an expandable, easy-to-understand way. The files will be organized based on need, language, underlying OS dependencies, development environment, system environment and product. The list seems long, but it is fairly simple to resolve. Table 1 shows our famous APS device.

Table 1. APS Device

Talk to the Vendor

If this is a third-party ASIC, bug the device vendor for a driver. Often, they will have a sample driver you can use. This saves a ton of work. The coding standards might not be the same, but you can easily fix that, and if you want to change how long lists of registers are named, sed, awk and Perl can be your friend. Often, there are subtleties to programming a device that are not at all obvious from the documentation. Just as likely, it might not even be documented. The sample driver may already take this into account. (What, you didn't tell us that register x needs to be set to 0xab before we can write to register y?)

Know Your OS

Often, there are OS functions that can act as glue to your driver. A designer writes, for example, pciConfigInWord(). !. Another example is the interrupt code. Other devices that an OS might care about include serial ports, Ethernet controllers, communications controllers (USB, FireWire), PCI controllers, etc.

Should I integrate the driver into the OS? The OS often has a model for writing device drivers. For example, if you follow the VxWorks model, you can put your devices into the OS' filesystem and open(), close(), read() and write() your devices. Sometimes this is useful, for example, for a Flash memory device driver. You can open() the device, and then pass the resulting file descriptor to other routines. These routines don't care that it's a Flash device--they just want to read() it. Now, often this isn't useful. For example, I don't think there is much point in doing this for many data path-type devices. Putting it into the OS just complicates things:

APS_EnableDatapath(APSAddr, APS_dataPath1)

works better than

ioctl(APSDeviceFd, APS_ioctlEnableDevice, &dataPath1)
A general rule of thumb is, if it's memory, or looks like memory, integrate it into the OS. If not, don't.
Think about the Future

Right now, you're writing your driver with one particular card in mind. But your device may be used on a different card in the future (e.g., an ATM layer device used in an OC3 card now, but in an OC12 card later), or your card may be used in a different way later (e.g., OC192 I/O card used for ATM now but for IP later). Ask around, find out what the future might be for your device, and then make your driver with that in mind. Even then, when everybody says, ``No, we're only going to make a single-port card with this ASIC'', don't believe them. Make your driver robust and able to handle all the different configurations that they could throw at it. Here are a few random examples of what I mean. Maybe you've only got one device on your card, but you want to make your driver able to handle the case where a future card has multiple devices. So, in all of your routines, you calculate a device ID from I/O slot and port even though the function (in a platform file) currently only returns 0.

Make your parameters wide enough. Sixteen bits might be enough for now, but if you're going to need 32 later, make the parameters 32 bits now. [Remember, Bill Gates told us that 640K was more than enough RAM--Ed.] Put some things like base addresses and _APS_NumDevices in a Platform file even if there is at first only one variant of the file. Make some variables for things that are currently fixed (e.g., _APS_NumAPSsPerIoCard), and use this in various index calculations. If you do this, you will make things much easier for somebody else down the road.

Acknowledegments

I thank the people I worked with, namely Stephen Morton and Nick Droogh, for providing their valuable input and resources.

Part 2

See the November/December 2001 issue of ELJ for the second part of this article, which will discuss how these rules for device drivers can be actualized.

4771s1

Waël Hassan (wael@acm.org) is a PhD student at the University of Ottawa. He is also a real-time software architect. He is interested in telecommunication services and real-time systems. His PhD research is on formalizing and designing a global connectivity protocol. His hobbies include traveling, photography, dragon boat racing, swimming and skating.

email: wael@acm.org

Load Disqus comments