Linux on Track

Linux was used in two projects as a data acquisition system running more or less autonomously in the German ICE trains. This article describes design issues and implementation as well as the problems and solutions used in those projects.
GPS monitor

The GPS device is a cute little gadget that looks like a computer mouse without buttons. It is about the same size, shape and color as a mouse and is mounted on the top of the coach to have a clear view of the sky. It is connected to the computer via a serial line which also delivers power to the device. As soon as the GPS is connected, it starts sending several types of information which can be read with a command as simple as:

cat /dev/ttyS1

as long as /dev/ttyS1 is set to the correct baud rate. By writing to the device, it can be programmed to deliver only certain types of information.

The high speed data acquisition has one minor deficiency—it only delivers a dataset once per second. As described above, the positioning information is entered into every data file one kilometer in size. Now suppose that the data acquisition process starts reading from the GPS after it has acquired the last one kilometer sample. Reading may take up to one second while the wheel turns up to 28 times per second, thereby losing about 80 meters of data.

Since losing data was not considered efficient, the gpsmonitor program was introduced, running parallel to the data acquisition process. It reads the position information at the given rate of one second and stores it in a file where the most recent information is always available for the data acquisition process.

To make sure that the data acquisition process does not read partially written data sets, in general it would be necessary to use a file locking scheme to bar the acquisition process from reading while gpsmonitor is writing its data. However, one data set is only 80 characters in length and is sent to the file in one write-operation. Checking the Linux kernel sources might show that this is not an entirely atomic operation, but experiments with a process rereading the information at the highest possible frequency have shown that the probability that a write of 80 characters would be interrupted by another process is practically zero, i.e., was not observed. Consequently, file locking was considered to be unnecessary overhead.

Device Drivers

The most interesting part of the project for the Linux hacker is certainly the device driver section. As usual, no device driver could be obtained from manufacturers of the boards to be used. It is a pity that no manufacturer of measurement hardware recognizes the potential of Linux as a measurement platform. Certainly not a real-time system, but with today's fast processors and some precautions, Linux is able to stream data to the disk at high rates and without dropping data.

Writing a device driver seemed to be a daunting task, and it proved to be exactly that, but for reasons other than the expected ones. Not being particularly familiar with the internals of Linux, it first seemed that learning the interface between kernel and driver might be a complicated problem. It proved to be almost too trivial to mention. With an early version of the kernel hacker's guide, code of other drivers all around, and the helpful Net community, communication between kernel and driver was easily established.

The bad part was the hardware, mainly due to a lack of decent documentation. German distributors were approached to almost no avail, even for analog devices. Linux was as yet unheard of, and all that could be obtained was source code for MS-DOS and a user's guide for the RTI-860 containing a full schematic diagram. For the ADCO, the situation was just about the same. Nevertheless drivers were written, and the work is almost perfect today. Only the RTI-860 driver still contains a nasty bug, probably due to a timing problem: clearing the on-board memory and enabling the trigger cannot be done in the right order. Independent of which operation is done first, some samples are sometimes dropped, presumably only if the trigger line goes active very shortly after the trigger is enabled.

Another problem is the kernel itself. This problem was observed in 1.x kernels and seems to persist in 2.0.x kernels. Because the ADCO board has only a one kilometer sample FIFO and must be emptied before it overflows, at a 50KHz sampling rate the driver has to read the data out at a rate of 50Hz. Put another way, the driver has to have a look at the board at least every 20ms. With a time slice of 10ms in a typical Linux kernel, this must happen every other jiffy. For those not familiar with kernel code, it should be noted that there is a variable called a jiffy in the kernel, which is incremented by the timer interrupt. In the Linux kernel, a jiffy is defined rather exactly to be 10ms. In particular with the POSIX scheduler available in recent kernels, this should not present a problem. In contrast to the normal Linux scheduler which constantly changes process priorities to distribute processor time in a fair way, the POSIX scheduler allows a fixed priority to be attached to a process. With the right priorities, at least one process can be guaranteed to get the processor at the next scheduling event after it makes a request. This should be at the next tick of the clock, which is at most 10ms away.

In practice it was found that sometimes no scheduling occured for 40, 50 or even 100ms, which was even more irritating as no other process was active at that time. It looked very much like the mechanism responsible for paging and/or swapping was responsible for it, but due to limited resources, the problem could not be further investigated.

As a workaround, a mechanism in the kernel was exploited which allows small pieces of code to run between two jiffies. Although no scheduling was performed for up to 100ms, the timer interrupt was not blocked and ticked along fine every 10ms. One of its tasks is to run code which is registered on a certain queue by other parts of the kernel. By registering a function which reads the ADCO's FIFO into a driver-internal buffer, the problem of missing scheduling events could be circumvented. In fact, it is not even necessary to use the POSIX scheduler.