Open Source for the Space Age
NASA has started a rather ambitious project: to provide open-source everything. The main site is located at http://open.nasa.gov. From here, there is access to data, code and applications, among other things. This is a great launching point for anyone interested in space science and NASA work. In this article, I look at what kind of code is being made available that you might want to explore.
The available software covers several genres. Some are low-level, systems-layer software. You can go ahead and do some really long-distance transfers with the Interplanetary Overlay Network (ION). This is an implementation of the Delay-Tolerant Networking architecture (DTN) as described in RFC 4838. This software is physically hosted at SourceForge, and you can use this code to communicate with your next interplanetary probe.
A bit more down to earth is a middleware package that actually is hosted by the Apache Foundation. You can download and use the Object-Oriented Data Technology (OODT) middleware. OODT is component-based, so you can pick and choose which parts you want to use. There are components to handle transparent access to distributed resources, data discovery and query optimization, and distributed processing. There are also components to handle work-flow and resource management. Groups that are using it include the Children's Hospital of Los Angeles and NASA's Planetary Data System. If you're managing data systems, this might be worth taking a look at.
Getting back to actual science processing, you might want to download the Data Productivity Toolkit (DPT). This package is a collection of command-line tools, written in Python, that lets you work on text data files. These utilities follow the UNIX design method of having small utilities that do one task well, and then chaining them together to do more complicated processing. There are tools for massaging and manipulating your data, tools for doing statistics on that data and even tools for visualizing the data and the results. Many of the tools even provide an API to basic Python and numpy/scipy/matplotlib routines.
And, while I'm talking about Python and science, you also can look at SunPy. SunPy aims to provide a library of routines that are useful in studying solar physics. With it, you can query the Virtual Solar Observatory (VSO) and grab data that you can process. Many routines are available that allow you to plot this data using various color maps and processing filters. There is a Sun object that contains physical constants useful in solar physics, along with the sun's position and numerous other solar attributes.
A lot of the computational work done at NASA involves clusters of machines and massively parallel code. This means the NASA folks have needed to put together lots of tools to manage these machines. They also have been nice enough to release a lot of this code for public consumption. The first of these is multil (Multi-Threaded Multi-Node Utilities). In the standard GNU file tools, cp and md5sum operate as a single-threaded process on a single machine. The multil tools provide drop-in replacements called mcp and msum. These utilities use multithreading to make sure each node is kept as busy as possible. Read and write parallelism allows for individual operations of a single copy to be interleaved through asynchronous I/O. Split file processing allows for different threads to operate on different portions of a file in parallel.
NASA also provides a utility to give SSH access to your cluster. There is a middleware utility called mesh (Middleware Using Existing SSH Hosts) that provides single sign-on capability. Mesh sits on top of SSH, and instead of using the local authorized_keys file, loads a file for a dedicated server at runtime. Mesh also has its own shell (called mash) that restricts what applications are available to the user. Using this system, you can add and remove SSH hosts that are available to be used dynamically. Also, because the authentication is handled by a library that is preloaded when SSH first starts up, the restrictions are sure to be enforced on the user.
Now that you have a connection mechanism, you may need to handle load balancing across all of these machines. Again, NASA comes to your aid. It has a software package called ballast (Balancing Load Across Systems) that might help. This package handles load balancing for SSH connections specifically. Each available host runs a ballast client, and there are one or more ballast servers. The servers maintain system load information gathered from the clients and use it to make decisions about where to send SSH connection requests. Because all of this is handled over SSH, the policy deciding which host to connect to also can take into account the user name. This way, you can have policies that are specific to each user. This lets you better tune the best options for each user, rather than trying to find a common policy that everyone is forced to use.
Going back to doing science, another important task is visualization, and NASA has released several tools to help. The first one I look at here is World Wind. This is an Earth visualization system. You can use it to get a 3-D look at Earth and to see data projected onto the globe. It is a Java application, so it works on any desktop that has a Java virtual machine, as well as in most browsers. It is a full development kit, and it has several example applications that you can use as jumping-off points for your own code.
Taking visualization further from the surface of the Earth, there is ViSBARD (Visual System for Browsing, Analysis and Retrieval of Data). This application allows you to pull data from multiple satellites and display them concurrently. It also allows for 3-D viewing of all of this data. This type of vector field information is very difficult to analyze in 2-D plots, hence the need for this kind of tool. The latest version also allows you to visualize MHD (Magneto-Hydro-Dynamic) models. This way, you can compare results from model calculations to actual satellite measurements.
More extensive image processing can be done with the Vision Workbench. This is an application and a full library of imaging and computer vision algorithms. It isn't meant to be a complete, cutting-edge library though. Rather, it provides solid implementations of standard algorithms you can use as starting points in developing your own algorithms.
When you're ready to go and launch your own satellite, you can download and use the Core Flight Executive (cFE). This software is used as the basis for flight data systems and instrumentation. It is written in C and based on OSAL (Operating System Abstraction Layer). It has an executive, along with time and event services. You can track your satellite with the ODTBX (Orbit Determination Toolbox). The ODTBX package handles orbit determination analysis and early mission analysis. It's available as both MATLAB code and Java.
The last piece of code I cover here is S4PM (Simple, Scalable, Script-based Science Processor for Measurements). This actually is used at the Goddard Earth Sciences Data and Information Services Center to do data processing. It is built up out of a processing engine, a toolkit and a graphical monitor. S4PM allows a single person to manage hundreds of jobs simultaneously. It also is designed to be relatively easy to set up new processing strings.
The open-source project at NASA doesn't cover only code. NASA has been releasing data as well. The Kepler Project is looking for exo-planets. As I mentioned previously, you can download data from the Solar Dynamics Observatory. You can work on climate data by checking out information from the Tropical Rainfall Measuring Mission. You can look up tons of data from the various moon missions, from Apollo on up. There also is data from the various planetary missions. Climate data and measurements of Earth are available too.
I've touched on only a few of the items NASA has provided for the public. Hopefully, you have seen enough to go and check out the rest in more detail. There is a lot of science that regular citizens can do, and NASA is doing its part to try to put the tools into your hands.
Limited Time Offer
Take Linux Journal for a test drive. Download our September issue for FREE.
Topic of the Week
The cloud has become synonymous with all things data storage. It additionally equates to the many web-centric services accessing that same back-end data storage, but the term also has evolved to mean so much more.