Writing Real-Time Device Drivers for Telecom Switches, Part 2 of 2

by Wael Hassan

Instead of jumping into the organization of files or programming techniques when programming for real time, first evaluate background information. Ask questions about time constraints, message frequencies, buffer sizes and data path dependencies. Also, ask if there is any intelligence built into registers. You will find this information useful when you are in the debugging and implementation stages. The safest way to learn how to write software drivers is to modify an existing driver. Writing one from scratch might be a little hard for a novice writer.

Here are the basic steps for writing a software driver:

  1. Become familiar with the device.

  2. If it's a third-party ASIC, get a sample driver from the vendor and use the sample device driver as a guide.

  3. Consider the issues raised in Part 1 of this article (September/October 2001 issue of ELJ).

  4. Write a skeleton driver, containing only function prototypes and empty implementation. Review the skeleton with team members to see if additional functionality is needed. More importantly, allow for added functionality.

  5. Write your first I/O routine and test it.

  6. Proceed with coding.

Hardware Access with a Memory-Mapped Structure

There are several ways to structure the hardware access in your HAL layer. You could have functions like _APS_WriteMBRtable() and _APS_ReadCellCounter(), and they will simply talk to the hardware directly via RW_ReadLongWord().

There are also some drivers that define a struct and take a pointer to it and point it at the base address of the device. They simply can dereference the struct pointer as

_APS_Memory->contextTable.entry[12].esot = 1;

Or you could pass all hardware access through a small number of specialized access routines. If these are inlined, there is no performance hit:

_APS_WriteMem()
_APS_ReadMem()
_APS_WriteReg()
_APS_ReadReg()
A problem with this is you have less control over what the compiler is doing. Is it doing byte or word reads? How can you control what it does with FIFOs or clear-on-read registers? This choice is not without its disadvantages.

Think of the time it takes to do this:

address = _APS_Base + _APS_tStats;
RW_WriteMem(address, value[1]);
RW_WriteMem(address+4, value[2]);
RW_WriteMem(address+8, value[3]);

vs. the amount of time to do this:

_APS_WriteMem(_APS_tStats, value[1]);
_APS_WriteMem(_APS_tStats+4, value[2]);
_APS_WriteMem(_APS_tStats+8, value[3]);
_APS_WriteMem(longword address, longword value)
{
  RW_WriteMem(_APS_Base + address, value);
}
On balance, I'd recommend using a few specialized access routines, as it allows you great flexibility with few of the problems listed above and with careful coding and inlining, there is no speed hit. You also know that all accesses to your device come through this one gateway giving you better debug ability.
Mailbox Access

You may have a device that uses mailbox access rather than a memory-mapped structure. Often such mailbox accesses allow you to write multiple values at once. Obviously, it would be inefficient to have a _APS_WriteMem() that accesses the mailbox but only writes one word at a time. But you could have a _APS_WriteEntries(longword data[], longword startAddr, int numEntries) routine that still would be your one point of access as described above. Sometimes your device may allow mailbox access in various sizes (e.g., bytes, words, longwords). You should provide this flexibility.

Driver Initialization

There are several parts to driver initialization:

  1. Device-driver initialization: memory allocation, global variable initialization, mutex lock creation, etc.

  2. Device initialization: setting registers and memory to startup values.

  3. Device power-up diagnostics: memory, register, interrupt tests, etc.

These should be kept as separate functions so each can be called at the appropriate place during startup. Also, as much as possible, keep initializations free from dependencies on other modules. People are always moving different initializations before and after others, making interdependence a real problem.

Accessing a Device with Multiple Data Paths

When accessing a device that has more than one data path, i.e., four streams of data flowing into it (0-3), you should take extra care in finding out which channels need to be accessed. Let's say we have two slots labeled (0-1), and we have eight ports per slot (0-7). By design we have the following: ports 0-3 on slot ID 0 belong to data path 0, ports 4-7 on slot ID 0 are attached to data path 1, ports 0-3 on slot ID 1 are associated with data path 2 and ports 4-7 on slot ID 1 are connected to data path 3. Notice the importance here of knowing if the numbers are 0- or 1-based. Such a complication in the hardware can lure us to write several conditional if/then/else statements. We can avoid that by plotting a binary table of the required results. Then we can figure out the bit shifting that will help us.

The following piece of code simplifies the table into a couple of register shifts. I am sure that bit shifting can be your friend in many cases, but you have to be careful:

DataPathSelected = (ioSlot<<1) + (PortNumber>>2)

If you plot the table for values of ioSlot ranging from 0-1 and PortNumber 0-7 per I/O card, DataPath will follow Table 1. This will actually give you a map and will always result in a 0 or a 1 indicating the data path required (see Listing 1).

Table 1. Binary Table of Required Results

ISRs

Interrupt service routines (ISRs) are powerful because they allow a simple and low-latency method to receive events. The disadvantage to them is that they can potentially flood the system and tie up system resources. If you have an event that could arrive in a flooding manner, you must provide a mechanism to detect this and disable the interrupt for a period of time.

How should an ISR be structured? It usually follows a template like this:

  1. ISR determines which device caused the event and masks it (this is probably handled for you by the SYS_Interrupt module that will call your registered ISR).

  2. ISR reads its device's interrupt registers to determine what happened.

  3. ISR sends an event to the application level task that is interested in the event.

  4. ISR finishes.

When the application gets the event, it performs the following functions:

  1. Task handles the event and performs the appropriate actions.

  2. Task acknowledges the interrupt by writing to any device registers plus perhaps any acknowledgement register in intermediate PLDs.

  3. Task re-enables the interrupt.

And that's it. This scheme is designed to minimize the CPU burden in ISR space.

Some schemes may involve acknowledging the interrupts within the ISR, rather than in items two and three at the applications level. If your ISR has multiple clients, you might be worried about one misbehaving client ruining it for the other client. This is one possible reason to put event acknowledging in the ISR. Also, some devices have clear-on-read (COR) event bits rather than clear-on-write (COW). To ensure consistency of structure between these two types of devices, you might want to acknowledge your events in the ISR.

With devices with COR event bits, you also must not make the mistake of clearing all events as follows:

events = _APS_ReadWord(devId, _APS_rApsEvent);
_APS_WriteReg(devId, _APS_rApsEvent, 0xFFFF);

Since there might be a new event between the two instructions, you must not write any more than the bits you read:

events = _APS_ReadWord(devId, _APS_rApsEvent);
_APS_WriteReg(devId, _APS_rApsEvent, events);
You also must be very careful in the device init and the ISR to ensure that all interrupts are either masked or acknowledged properly by the ISR. If not, you could hang the card software. I recommend always acknowledging all received interrupts in your ISR, not just those that you know how to handle. However, if you are implementing a more sophisticated ISR structure, just make sure that you write down all the possibilities and ensure that it all works out all right.
Mutex Locks

You may need to have mutual exclusion locks to ensure data integrity in your device. Here are a series of questions you need to ask:

  • Is there going to be more than one task accessing my driver routines?

  • What scenarios could impact data integrity on my device? This is very device-dependent. For example, if your device has purely memory-mapped access, mostly you're all right. However, if you have read-modify-write operations, they may be affected by writes to the same memory. But writes to other parts of your device memory would be okay. If you have a device that uses mailbox access, you cannot have two interleaved accesses.

  • Where could I put the locks so as not to have them called unnecessarily, yet not have them everywhere in a confusing manner? Remember, mutex locks are costly in two ways. Rule one: it takes time to get a lock, even if it isn't already taken, so if you don't have to get a lock, don't. Rule two: an active lock may be blocking another, higher priority, task. So, in general, lock around the smallest amount of code you can. The obvious exception to this rule is that if locking around the smallest piece of code means that you will get and release the lock 1,000 times in a loop, but have the option to put the lock around a bigger piece of code and only get and release the lock once, then put it around the larger piece of code. That's just rule one.

  • What about the situation where under normal operation, you wouldn't need mutex locks, but if debug routines are being used, then you would need locks? Should you compromise the speed of the system just for the benefit of your debug routines? I'd say no. Debug routines are not for field use anyway.

How to Name Your Registers and Memory Tables

You've got a whole device with lots of registers and memory tables. How are you going to name them? First, name your registers and tables with the same names that they have in the hardware specifications. While in some cases you might not like the naming employed, it will make it so much easier to figure out the software that it is still recommended.

You have a couple choices with regard to memory. You can make a struct that is set up just like the device memory, make a pointer to this struct point to the base address of your memory (or your simulator memory) and then just write to this struct. Or you can define some memory offsets and do RW_WriteLongWord() using these and the device base address from there. I'd recommend against the former because it doesn't allow you to have a single set of hardware access routines.

You could define your registers using enums or #defines. It's a matter of personal taste.

Overall, it would be nice to be able to tell the registers, the register bit-masks and the memory offsets from one another easily. Here are a couple of good schemes (ideally, the mask name would contain the name of the register for which it is a mask):

#define _APS_rCellStats           0x0000
#define _APS_mCellStatsPortClear  0x00000001
#define _APS_mCellStatsPortSwap   0x00000002

You may find that this causes the mask names to get too long and simply may wish to have the following:

#define _APS_rCellStats  0x0000
/* for CellStats */
#define _APS_mPortClear  0x00000001
#define _APS_mPortSwap   0x00000002
You do run a greater risk of writing the wrong value to a register, but this benefit may be greater than the detriment of having enormously long mask names.

If you're ever dealing with any variables that could be 0- or 1-based, indicate clearly in every functional interface which one it is. This will save everybody a whole lot of grief.

Port Mapping

When programming a device with more than one data path, there will be a native data path for which the chip had been designed. The newly added data path supports the newer and higher-numbered ports. One technique habitually followed by software designers is to allocate the even-numbered ports (starting from 0) to the first data path. The odd-numbered ports are assigned to the second data path. Listing 1 shows how you can select a data path based on slot and port numbers.

Listing 1. Selecting Data Paths

The bit shifting of the ioSlot number and the port number will get you the data path number (0-3). In this scenario, one card can have two I/O cards and each I/O card can have two data ports.

Alignment of Bytes

You should be aware of how your device memory is structured. Hardware engineers often allocate more memory for future expansion, so your registers will not be byte- or word-aligned. I had an experience with an 8-byte-aligned register space, where the device only used word-long registers. Special handling is needed when reading and writing to accommodate for the extra bytes. The fastest way to align or to pack two words is to use a union. For example:

Union
{
      Word      wCause[2];
        LongWord  lwCause;
} CauseReg;

This can be accessed as:

CauseReg.wCause[0] = 0x0FB or
CauseReg.lwCause     = 0x0fB0F
Register Mask Definitions

Register masks are really tricky. One has to be very cautious when using them. One very common mistake is to give two masks the same numeric value. Another is to use the wrong mask. Not much can be done about the latter; however, one can use the following scheme for defining one-bit mask registers:

#define APS_intPort2LossOfSignal   ( 1 <<11)
#define APS_FmonEnable             ( 1 << 7)

The definitions mean that bit 11 and bit 7 (0-base) are the masks for the LossOfSignal and the Fabric Monitor Enable bits.

The Last Mile

This is one of the most important stages in this cycle. The evaluation of the code quality, accuracy and robustness is done here. How do you verify that your code works? Before I explain how, I would like to say that it is not enough that your code works. Why? Simple. If you verify that A, B and C work as separate subsystems, together A and B and C might not work at all. In addition, they may produce inconsistent results.

Simulator testing is really important and useful when available. Here is a list of things that you should do in testing:

  1. Before doing anything else, get a copy of the hardware spec and write a show routine. A show routine is one that shows all the registers in order and prints out their bits. Display a matrix-like structure.

  2. Make sure you can read and write all the registers and change all the bits.

  3. Execute all of your functions; try to call them from a script.

  4. Scan your header file for errors. If you have register masks, make sure that you don't have two masks that are the same.

  5. Document your header file to indicate if your numbers are zero-based or one-based.

  6. When you discover that one of the bits is not being flipped, first check header file masks. Then see if the right function is called. Finally, check the function contents; it may be you are using the wrong offset or the wrong bit mask.

To test the whole device, implement a loopback at both ends, for ingress and egress. Use a function that inserts cells, then change your register values and see if there is an effect. For integration testing, if the code supports two different pieces of hardware, make sure that the neighboring chips on the data path call the generic wrapper call.

Conclusion

Writing drivers is fun but can be really cumbersome. There is a lot of literature on regular software development but not on real time. Real-time techniques are based on heuristics and personal experience. Advanced object-oriented modeling is not so applicable for such systems because of short development cycles and time-to-market constraints.

Resources

Glossary

Waël Hassan (wael@acm.org) is a PhD student at the University of Ottawa. He is also a real-time software architect. He is interested telecommunication services and real-time systems. His PhD research is on formalizing and design of a global connectivity protocol. His hobbies include traveling, photography, dragon boat racing, swimming and skating.

email: wael@acm.org

Load Disqus comments