Tracing Real-Time Application Interface Using the Linux Trace Toolkit

As embedded and real-time Linux explode, development tools are necessary to track behavior--here's one.
Tracing RTAI

Having built the RTAI-modified Linux kernel and RTAI subsystem and having rebooted the system, it is now time to load the required modules. The first module to be loaded is the RTAI tracing framework. If all has been compiled adequately, it should be found under [rtai-base]/modules/rtai_trace. Simply use insmod to load this module. This command should print out this message: RTAI-TRACE: Initialization complete.

The next module to be loaded is the trace driver. If the make modules and make modules_install are issued during the kernel build, then you should only need to type insmod tracer. This will print out the following: Tracer: Initialization complete.

The system is now ready to be traced. However, if tracing is activated now, there will be no RTAI-related information collected, as the RTAI subsystem has not been loaded. In order to load the RTAI subsystem and related modules, move into the RTAI directory and use the ldmod script. This will load the RTAI core module, the FIFO communication capability and the RTAI scheduler. Thereafter, if an RTAI task is loaded and the system is traced, we will be able to trace all the RTAI events that occur.

As with the ``vanilla'' Linux kernel, tracing is started and controlled by the trace dæmon. Typically, when tracing an RTAI task, the steps to follow are:

  1. Load the the trace facility.

  2. Load the trace driver.

  3. Load the RTAI subsystem.

  4. Start the trace dæmon with an adequate time duration for the real-time task to be observed.

  5. Load the real-time task.

Once tracing is over, it is recommended that the real-time task be unloaded as it will remain a priority to any normal Linux process, including the shell. Also, if you plan to visualize the trace right after having accumulated it, which is often the case, then leaving the real-time task active will slow down the visualization tool. This is not desired as the traces tend to be large and the processing required to deal with them can be substantial.

A Couple of Examples

Having seen the necessary background of RTAI tracing, it is now time to proceed with a couple of examples in order to illustrate the full potential of tracing RTAI with LTT. In order to ease the demonstration, the examples distributed with the RTAI source will be used to present RTAI tracing as they are readily available. These examples can be found in [rtai-base]/examples. The examples used are: frank, fp and msg_clock. The purpose of each example is presented, and a set of collected traces regarding the example is discussed.

The frank example consists of a hard real-time module that embodies two real-time tasks and a regular non-real-time application. The real-time tasks communicate with the non-real-time application using FIFOs. These are accessible in normal user space through the /dev/rtfX files, where X is the FIFO's ID. For the real-time tasks, the FIFOs are accessible through the FIFO module's API. By using FIFOs, the real-time tasks can collect data in real time. Meanwhile, the application can read the data through the FIFOs when the normal Linux scheduler gives it the chance to do so. There are many situations where the FIFO mechanism can be useful because of this capability.

Figure 1 illustrates a first example of real-time behavior under Linux using the frank example. As with normal Linux traces displayed by LTT, horizontal lines represent time spent in a certain state while vertical lines signify state transitions. Time is the X axis, while the different tasks and applications (the ``states'') are on the Y axis.

Figure 1. Frank Example Showing Real-Time Task Interactions with Linux Tasks

The portion of trace illustrated in Figure 1 shows the interaction between the two real-time tasks and the non-real-time application. On the left, we can see that an IRQ 0 occurred to signal a timer-expiration. This triggered a transition into the RTAI subsystem (remember that RTAI intercepts all interrupts).

Having received the interrupt, RTAI decides to schedule real-time task one. The task writes something to the FIFO and gives up the CPU. RTAI then determines that real-time task two has to run and schedules it. Real-time task two runs, writes something to the FIFO and gives up the CPU. Having no more real-time tasks to schedule, RTAI schedules the Linux kernel. The latter determines that process 651 has to run. Process 651 is actually the non-real-time process that interacts with real-time tasks one and two, and it was awakened because the files it was waiting on, /dev/rtfX, now contain data. Figure 2 illustrates this interaction using the ``Raw Trace'' thumbnail that lists all the events that occurred in full detail. Correlating this raw list of events with the original source code of the frank example is fairly straightforward.

Figure 2. Same Iteraction in Raw Form

The fp (floating point) example's main purpose is to demonstrate that the real-time layer manages the floating-point hardware adequately. That said, it is also very useful in showing how LTT can be used for profiling real-time tasks.

As with the frank example, the fp example is split in half with a hard-real-time part and a non-real-time part. Communication between both halves is conducted using the FIFO's facility as described above. Having traced the execution of the fp example for a certain amount of time, the LTT visualization was able to provide a certain number of details about the example's execution, as seen in Figure 3. In the ``Process analysis'' thumbnail we are able to find that the fp real-time task has run for 9% of the duration of the trace; its average time running on each run is 10.5 microseconds; it typically sleeps for 90 microseconds between runs, and it was scheduled 86,201 times. Moreover, we have information regarding the quantity of data it wrote and read from FIFOs. As the fp example is made to compute a sine, which is then displayed by the non-real-time application, there is much more information written into FIFOs than there is read, as the numbers show.

Figure 3. Execution Statistics Regarding the Real-Time Floating-Point Example

The msg_clock example shows an extensive use of communication between different real-time tasks. A portion of these interactions can be seen in Figure 4. Contrary to the previous two examples, the complexity of the interaction would probably require an extensive degree of analysis in order to fully understand the system's evolution. Nonetheless, it is remarkable to have such capability, as it would be impossible to retrieve comparable information with a normal debugger. Furthermore, contrary to a normal debugger, LTT does not modify the normal behavior of the observed process of a task. Hence, complex synchronization problems can be isolated and fixed. Had there been a synchronization problem in the interactions displayed in Figure 4, LTT would have been very helpful in finding it. Note that a tool similar to LTT was used by the team that developed the Mars Pathfinder at the JPL in order to understand why the system kept resetting and, thereafter, apply a fix.

Figure 4. Complex Real-Time Task Interaction as Can Be Seen in the msg_clock Example

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix