Testing Safety-Critical Software with AdaTEST
The increased adoption of embedded Linux within the general consumer electronics market gives rise to new areas of application development for embedded Linux outside the usual realm of PDAs and mobile phones. Industries such as avionics, railway signaling, process control and medicine are all users of embedded systems. Common to them all is a need for safety-critical software. Safety-critical software is a class of systems whose failure may cause injury or death to human beings. In addition to real-time requirements, including proper control over timing and scheduling, such systems have absolute demands regarding correctness of behavior. Please refer to Kevin Dankwardt's excellent article "Real Time and Linux" for more on real-time systems.
Strict formal methods are applied in developing safety-critical software. Counted among these methods are various forms of testing. Testing is performed to eliminate possible bugs and to ensure correctness of behavior. The requirements for developing safety-critical systems are so strict that even tools used in the development process must comply with minimum requirements for formal methodology.
One such testing tool AdaTEST, from the British company IPL. AdaTEST is, of course, a tool for testing Ada software. It has been audited and found qualified for use on projects complying with the RTCA's DO-178B, an international safety standard for the avionics industry. AdaTEST therefore can be used for developing safety-critical systems. However, a pertinent question arises: AdaTEST is designed for testing software written in Ada; with the power of C at hand, why bother with programming Ada for the Linux platform?
Ada and Linux aren't a necessarily obvious combination. As several free and/or commercial real-time Linux implementations already are available on the market, the infrastructure for developing safety-critical Linux systems is in place. Unlike general purpose languages, say C and Java, hard real-time requirements are inherent in the Ada core language's tasking model. The task is an Ada-language construct equivalent to the operating system's thread. Due to its strong typing, we cane be confident that Ada programs contain few surprises--a perfect match for developing safety-critical software. Ada has therefore become a de facto standard for industries like avionics and railway signaling.
As for embedded platforms, Ada was originally developed by the US Department of Defense for use in embedded system applications. It is therefore a perfect match for the future's embedded, safety-critical Linux solutions.
But how does Ada mix with Linux? In fact, it mixes quite well. The GNU Ada tool chain (GNAT) is an Ada front-end to gcc, tying Ada closely with the operating system. With standard facilities to import C functions, Ada allows for metal-near programming by importing any C functions, including system calls if need be.
Despite its commercial license, AdaTEST comes with out-of-the-box support for GNAT, which makes it interesting for developing Linux software.
AdaTEST provides facilities for dynamic testing, coverage analysis and static analysis. Dynamic testing is what most of us know by the general term "testing". Its purpose is to make sure the software does what it should. Coverage analysis produces metrics to evaluate whether the tests are sufficiently thorough. A static analysis assesses the software's complexity and use of language constructs. Although important parts of the AdaTEST suite, dynamic and static analysis are outside the scope of this article.
AdaTEST consists of a test harness and a library. The harness provides facilities to run, verify the results of and document dynamic tests. It consists of a set of library directives that are accessed from the test script. The test script is the basis for all your testing; it is simply an Ada procedure that exercises the software being tested.
To make sure the software does what it is supposed to do, the output is verified. Verification is handled with a CHECK function. The CHECK call compares an actual output value with an expected value, and it returns a true or false response, depending on the result. AdaTEST ships with CHECK functions for all of the types defined in Ada. AdaTEST also comes with CHECKs to compare memory blocks and check for external events, as well as a set of generic CHECK functions for instantiation to verify your own types.
The test harness allows you to compile the test script into an executable. Once the executable is run, a test report is written to an ASCII file. Events classed as unexpected are marked with >>, followed by an appropriate error message. A typical example of an unexpected event is a CHECK that returns false. The report ends with a test summary that prints the number of passed CHECKs, the number of failed CHECKs, the number of unexpected errors and all possible script errors (i.e., syntax errors in the test script). At the very end of the report, an overall test result is recorded. The test script fails if one or more unexpected events have occurred.
A test case consists of two parts: the software being tested and the test script. In our current project we use CRC to ensure our files aren't corrupted. To this end, we have written a Verify_File_CRC.
with CRC_Checker; with File_IO; use type File_IO.File_Handle; with Error_Logger; procedure Verify_File_CRC (File_Name : in String; Success : out Boolean) is Fh : File_Io.File_Handle; begin Fh := File_Io.Open(File_Name); if (Fh > 0) then CRC_Checker.Check_CRC(Fh, Success); if (not Success) then Error_Logger.Log_Error("Error in file " & File_Name); end if; File_Io.Close(Fh); else Error_Logger.Log_Error("Could not open file " & File_Name); end if; end Verify_File_CRC;
This is the program's main method. It makes use of three external packages, all of which are available on the Web.
The corresponding test script is shown in Listing 1.
For ease of demonstration, AdaTEST library directives are written in all caps. The script itself starts with the START_SCRIPT function and ends with END_SCRIPT. The name of the report file generated is passed as a parameter to START_SCRIPT. If AdaTEST is unable to write to this file, say in the event that it lacks write permissions, the report is written to stdout. Both absolute and relative paths are allowed for report files. AdaTEST appends the .atr suffix when the report is written to file.
A test script consists of one or more tests. Each test is delimited by a function pair--START_TEST and END_TEST. END_TEST takes its test number as parameter. Test numbers are used by the test harness to identify tests in the report. Two identical test numbers will result in a script error.
EXECUTE is used to start the test. Variables used by the test are initialized between END_TEST and EXECUTE. EXECUTE takes three parameters. The first is simply a test string to describe what is being executed. It is an arbitrary string and is used only in the report file. The second parameter is a list of stub calls that must be exercised for the test to pass. More on stubs a bit later. The third parameter tells AdaTEST whether you expect the software being tested to throw an exception. Testing exceptions is simple when combining this parameter with the appropriate exception handlers within the test.
The COMMENT directive is well worth mentioning, too. Upon executing the directive, a string is written to the report file. We've found COMMENT to be an excellent way of adding the desired verbosity to our test reports.
Although AdaTEST enforces no coding conventions, we have found it beneficial to add the E_ and O_ prefixes to variables holding, respectively, expected output and actual output values. We have also learned the necessity of isolating the tests from each other within a script. Writing tests that depend on the correct execution of previous tests is a sure path to extra work. If you choose this path, you'll quickly find your tests falling apart when you update the test script. Our recommendation is to spend the extra effort of always reinitializing the software being tested and the variables to be used when starting a new test.
Another piece of advice is always set variables with the actual output values to something completely other than what is expected. If a programming error leaves the output value uninitialized by the software being tested, your tests will, at best, hang when stumbling over an uninitialized variable. At worst, an arbitrary value will be supplied by the operating system, providing you with the perfect red-herring in trying to track down the bug.
Unit testing is aimed at verifying the correctness of one single unit, typically an Ada package. Units therefore need to be tested in isolation. AdaTEST provides an elegant way of doing this with its stubbing functionality.
A stub is a piece of code that simulates expected outputs. It is usually a package that simulates the output of external code, but it may be created for any subprogram. It may even be the body of the software being tested, as long as the subprogram is defined in its own file using Ada's separation mechanism. Stubbing is a simple way of exercising all possible execution paths through the software being tested. The tester is at all times in complete control of what the external software produces for input to the software being tested. Hence, the correctness of only the software being tested is measured. Stubs, combined with the EXECUTE directive's second parameter, the stub list, ensures that external subprograms are exercised in the right order.
Here, we use a stub to simulate the output of our File_IO package.
with ADATEST_HARNESS_COMMANDS; use ADATEST_HARNESS_COMMANDS; with ADATEST_HARNESS_STUB_SIMULATION; use ADATEST_HARNESS_STUB_SIMULATION; package body File_IO is function Open (File_Name : in String) return File_Handle is Ret : File_Handle := File_Handle(0); begin START_STUB("File_IO.Open"); case ACTION is when 1 => Ret := File_Handle(1); when 2 => Ret := File_Handle(-1); when others => ILLEGAL_ACTION; end case; END_STUB; return Ret; end Open; procedure Close (Fh : in File_Handle) is begin START_STUB("File_IO.Close"); CHECK("File handle is valid", Fh > 0, True); case ACTION is when others => null; end case; END_STUB; end Close; end File_IO;
The stub is an ordinary Ada package that implements the same functions and procedures as the software it simulates. You will notice that, as with scripts and tests, your stub code is delimited by a pair of start and end directives--respectively START_STUB and END_STUB. The code within a stub controls the output.
Looking back at the call list parameter of test script's EXECUTE call,
EXECUTE("Verify_File_CRC", "File_IO.Open:1;" & "CRC_Checker.Check_CRC:1;" & "File_IO.Close:1" , EXCEPTION_NOT_EXPECTED);
you'll see that the test is expected to run the stubs File_IO.Open, CRC_Checker.Check_CRC and File_IO.Close during its execution. The call list is used by AdaTEST to make sure the correct stubs have been called, in the correct order, and to confirm what action is to be taken by each stub when called.
You'll notice that every stub in the call list is appended with a colon and a number. This number is passed to the stubbed subprogram as its ACTION variable. ACTION is used in a case statement to determine the correct output, as shown in the first subprogram in the stub above.
When executing a stubbed subprogram, AdaTEST makes only a simple pattern match between an entry in the EXECUTE call's stub list and the stub name passed as a parameter by the subprogram's START_STUB call. The only requirement here is that a list entry match the stub name. You are free to choose your own naming for stubs.
Stubs called in the wrong order, or called but not specified in the call list, are reported as errors and lead to test failure.
After having put AdaTEST to the, ahem, test by using it for functionality and requirements testing of safety-critical, real-time systems, we must admit that we are rather impressed. In addition to its extensive functionality, AdaTEST is practically wrinkle free. Its design, functionality and behavior are consistent and predictable.
When testing, AdaTEST has to instrument the software being tested to provide statement and decision coverage. You can, thought, run the code uninstrumented when these features aren't required. All tools are implemented as command-line utilities, which simplifies automation and use. The instrumenting does produce a slight performance hit, especially when running recursive and/or many small subprogram calls. Normally this isn't a problem, but when testing both statement coverage and performance you'll need one test for each.
We haven't come across any insurmountable problems while working with AdaTEST. The only thing we've really been missing is the possibility for fixtures that other testing tools, such as JUnit, provide. A fixture is a set of variables shared by all tests in a test script. It is set up before each test is run and torn down afterward. By reinitializing a fixture in the setup stage and freeing up resources in the tear-down, tests always start with a clean slate. We could solve this by restructuring our test scripts, but we've yet to find a technique that completely satisfies us.
AdaTEST is implemented as a state machine. The state changes when START_TEST/END_TEST and EXECUTE/DONE procedures are called. The most common error is forgetting an end directive, which leaves the state machine in an invalid state. Luckily, the test harness does not loop indefinitely when this occurs. Instead, it times out, logs the error and continues executing the test. When the tests are done, such errors are easily spotted in the test report. Likewise, if instrumented code is executed outside START_TEST/END_TEST blocks, the state machine enters an invalid state.
Testing of tasks is another issue that requires careful consideration. The AdaTEST harness runs in two distinct modes: concurrent and sequential. Concurrent mode is the default. In this mode the test harness runs in a separate task that is parallel to the test script and the software being tested. In sequential mode, on the other hand, the test harness runs within the same task. Running several instrumented tasks at the same time may lead to problems in sequential mode, as the test harness does not protect itself against re-entrance. Concurrent mode ensures that only one directive is executed at a time. It is the preferred mode for testing tasks, but with separate tasks the possibility for race conditions and deadlocks increases. Generally, we have not experienced any unsolvable problems in testing tasks, but it has caused some head-scratching on occasions.
Instrumenting code produces two files: an instrumented code file with the .ati suffix and a code listing file with the .lst suffix. The code listing prints each code line as numbered and, optionally, in metrics at the end. The metrics are fairly comprehensive. Various incarnations of McCabe and Halstead are produced, in addition to some basic code metrics, such as number of code lines, number of declarations and comment count, to name a few. The metrics in the listing are clearly labeled and easy to extract for reporting purposes.
Because AdaTEST scripts are compiled as ordinary executables and the test reports are plain text, we found it simple to set up a system for regression testing. We are using GLIDE, the Ada IDE that ships with GNAT. GLIDE makes use of a project file to save building parameters. Using Perl, we simply extracted these parameters to compile our tests, run them, extract their results and write a summary to file. The test reports are another of AdaTEST's strong points--the output and results of the checks are easily parsed and extracted. The entire regression test system was constructed in a matter of hours.
One could argue that good open-source alternatives for testing already exists (like AUnit for Ada), so why buy something when you can get roughly the same functionality for free? Well, sometimes you can't. If you are required to document full statement coverage and/or full decision coverage (as in DO-178B), you may be out of luck with other programs. In addition, open-source projects have a tendency to fall a bit short of the formal methodology requirements laid out by safety-critical standards. The kind of formalism required sort of defeats the whole purpose of the typical open-source development project.
Our experience with AdaTEST has been nothing but positive. As mentioned earlier, we're greatly impressed by the quality of the software. We've found IPL's on-line support to be top notch, both swift and professional. The few times we've had a need for support, they have replied within hours of us submitting the problem. All issues have been closed in the same day. IPL provides on-site teaching in addition to their on-line support, and free evaluation licenses for AdaTEST are available on request.
However, all of this comes at a price. AdaTEST is, like all professional testing tools, expensive. IPL sells licenses in units of floating users for host-native use and platform-specific licenses for target use. They consider one floating user adequate for three people working roughly in parallel. The first floating user license is priced at 9000 EUR. Additional floating user licenses cost 6300 EUR. Targets are priced the same as the first floating user. All prices include 12 months initial support. It's not cheap, but we've found it to be worth every Euro.
Ståle Dahl is a consultant with the Norwegian company ConsultIT A/S. He has been using Linux since 1994. Ståle can be reached at firstname.lastname@example.org.
Thomas Østerlie is also a consultant with ConsultIT A/S. He works mainly with server-side systems development for UNIX platforms and with computer security. Thomas can be reached at email@example.com.