Could OSS have saved the '97 Mars Pathfinder

by Kaiwan N Billimoria

I recently read some interesting e-mails discussing the 1997 Mars Pathfinder mission. Yes, it's an old topic now; nonetheless, these e-mails are interesting reads.

The first message, written by Mike Jones, describes the so-called software glitch that occurred on the Mars Pathfinder vehicle while it was on the surface of Mars and how the problem was solved. An authoritative follow-up message was penned by Glenn E Reeves, the software team leader from JPL for the Mars Pathfinder mission.

To briefly summarize, a software priority inversion problem caused constant resets of the spacecraft's software. The problem finally was tracked down and solved by JPL's engineers with support from the software provider, Wind River. Yes, it ran VxWorks.

At first glance, this dialogue is merely interesting; I think every hardware and software engineer/tinkerer should read them. On deeper reflection, however, I was struck by something more. Although I assume it was not their intention, the authors quite clearly demonstrate how open-source software (OSS) and the OSS development model would have helped this project enormously, not only in finding the bug but, in all probability, preventing the bug in the first place. The extracts from these e-mails and my comments below should make more sense to you after you've read the original postings. In his well thought-out reply, Glenn makes the following statements:

1. In the section entitled "HOW WAS THE PROBLEM CORRECTED":

Once we understood the problem the fix appeared obvious: change the creation flags for the semaphore so as to enable the priority inheritance. The Wind River folks, for many of their services, supply global configuration variables for parameters such as the "options" parameter for the semMCreate used by the select service (although this is not documented and those who do not have vxWorks source code or have not studied the source code might be unaware of this feature).

2. In the section entitled "ANALYSIS AND LESSONS":

Did we (the JPL team) make an error in assuming how the select/pipe mechanism would work? Yes, probably. But there was no conscious decision to not have the priority inversion enabled. We just missed it. There are several other places in the flight software where similar protection is required for critical data structures, and the semaphores do have priority inversion protection. A good lesson when you fly COTS stuff--make sure you know how it works....

Both statements quite clearly show how using OSS (such as Linux, of course) could have provided the very documentation and source code availability that most projects surely miss out on--unless you're JPL or the government and can swing getting the source. In my opinion, OSS could--in fact, would--conceivably reduce the risk component of the project while simultaneously elevating its chances of success.

Kaiwan N Billimoria in corporate training at Designer Graphix in Bangalore, India.

Load Disqus comments