IBM InfoSphere Streams and the Uppsala University Space Weather Project

Using IBM InfoSphere Streams, Uppsala University can analyze massive amounts of data to help it model and predict the behavior of the uppermost part of our atmosphere and its reaction to events in surrounding space and on the Sun.

In 2006, the International Astronomical Union (IAU) decided Pluto was no longer a planet, but rather that it was a dwarf planet. Then in 2008, the same group decided Pluto was a plutoid instead of a dwarf planet. This year, the IAU met August 3–14 in Brazil, and while at the time of this writing, that hasn't happened yet, Pluto's official title is expected to come up once again.

One of the big problems with Pluto is that we just don't have enough information about it. Apart from some very distant images and behavioral observations, much of our Plutonian information is mathematical guesswork. If we turn our focus to the opposite side of the solar system, however, the dilemma reverses. The amount of information we can gather about the Sun is so great, it's difficult to capture it, much less do anything useful with the data.

On January 8, 2008, Solar Cycle 24 started. Although that might seem insignificant to most people, in about three years, it will be reaching its peak (Figure 1). Solar storms, or space weather, can have a very significant effect on modern society. These invisible outbursts can take out satellites, disrupt electrical grids and shut down radio communications. There is nothing we can do to avoid solar storms; however, early detection would make it possible to minimize the effects. And, that's what researchers at Uppsala University in Sweden are trying to do.

Figure 1. We are just beginning this solar cycle, which makes early detection particularly important. (Graphic Credit: National Oceanic and Atmospheric Administration, www.noaa.org)

The problem is the amount of data being collected by the digital radio receivers—to be precise, about 6GB of raw data per second. There is no way to store all the data to analyze later, so Uppsala teamed up with IBM and its InfoSphere Streams software to analyze the data in real time.

LJ Associate Editor Mitch Frazier and I had an opportunity to speak with both IBM and Uppsala, and we asked them for more information on how such a feat is accomplished. We weren't surprised to hear, “using Linux”. Here's our Q&A session, with some of my commentary sprinkled in.

Shawn & Mitch: What hardware does it run on?

IBM & Uppsala: InfoSphere Streams is designed to work on a variety of platforms, including IBM hardware. It runs clusters of up to 125 multicore x86 servers with Red Hat Enterprise Linux (RHEL). The ongoing IBM research project, called System S, is the basis for InfoSphere Streams and has run on many platforms, including Blue Gene supercomputers and System P.

S&M: Will it run on commodity hardware?

I&U: Yes, x86 blades.

S&M: What operating system(s) does it run on?

I&U: InfoSphere Streams runs on RHEL 4.4 for 32-bit x86 hardware and RHEL 5.2 for 64-bit x86 hardware.

S&M: Are these operating systems standard versions or custom?

I&U: They are standard operating systems.

S&M: What language(s) is it written in?

I&U: InfoSphere Streams is written in C and C++.

S&M: How does a programmer interact with it? Via a normal programming language or some custom language?

I&U: Applications for InfoSphere Streams are written in a language called SPADE (Stream Processing Application Declarative Engine). Developed by IBM Research, SPADE is a programming language and a compilation infrastructure, specifically built for streaming systems. It is designed to facilitate the programming of large streaming applications, as well as their efficient and effective mapping to a wide variety of target architectures, including clusters, multicore architectures and special processors, such as the Cell processor. The SPADE programming language allows stream processing applications to be written with the finest granularity of operators that is meaningful to the application, and the SPADE compiler appropriately fuses operators and generates a stream processing graph to be run on the Streams Runtime.

[See Listing 1 for a sample of SPADE. Listing 1 is an excerpt from the “IBM Research Report—SPADE Language Specification” by Martin Hirzel, Henrique Andrade, Bugra Gedik, Vibhore Kumar, Giuliano Losa, Robert Soulé and Kun-Lung Wu, at the IBM Research Division, Thomas J. Watson Research Center.]

______________________

Shawn Powers is an Associate Editor for Linux Journal. You might find him chatting on the IRC channel, or Twitter

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix