IBM InfoSphere Streams and the Uppsala University Space Weather Project

Using IBM InfoSphere Streams, Uppsala University can analyze massive amounts of data to help it model and predict the behavior of the uppermost part of our atmosphere and its reaction to events in surrounding space and on the Sun.

In 2006, the International Astronomical Union (IAU) decided Pluto was no longer a planet, but rather that it was a dwarf planet. Then in 2008, the same group decided Pluto was a plutoid instead of a dwarf planet. This year, the IAU met August 3–14 in Brazil, and while at the time of this writing, that hasn't happened yet, Pluto's official title is expected to come up once again.

One of the big problems with Pluto is that we just don't have enough information about it. Apart from some very distant images and behavioral observations, much of our Plutonian information is mathematical guesswork. If we turn our focus to the opposite side of the solar system, however, the dilemma reverses. The amount of information we can gather about the Sun is so great, it's difficult to capture it, much less do anything useful with the data.

On January 8, 2008, Solar Cycle 24 started. Although that might seem insignificant to most people, in about three years, it will be reaching its peak (Figure 1). Solar storms, or space weather, can have a very significant effect on modern society. These invisible outbursts can take out satellites, disrupt electrical grids and shut down radio communications. There is nothing we can do to avoid solar storms; however, early detection would make it possible to minimize the effects. And, that's what researchers at Uppsala University in Sweden are trying to do.

Figure 1. We are just beginning this solar cycle, which makes early detection particularly important. (Graphic Credit: National Oceanic and Atmospheric Administration, www.noaa.org)

The problem is the amount of data being collected by the digital radio receivers—to be precise, about 6GB of raw data per second. There is no way to store all the data to analyze later, so Uppsala teamed up with IBM and its InfoSphere Streams software to analyze the data in real time.

LJ Associate Editor Mitch Frazier and I had an opportunity to speak with both IBM and Uppsala, and we asked them for more information on how such a feat is accomplished. We weren't surprised to hear, “using Linux”. Here's our Q&A session, with some of my commentary sprinkled in.

Shawn & Mitch: What hardware does it run on?

IBM & Uppsala: InfoSphere Streams is designed to work on a variety of platforms, including IBM hardware. It runs clusters of up to 125 multicore x86 servers with Red Hat Enterprise Linux (RHEL). The ongoing IBM research project, called System S, is the basis for InfoSphere Streams and has run on many platforms, including Blue Gene supercomputers and System P.

S&M: Will it run on commodity hardware?

I&U: Yes, x86 blades.

S&M: What operating system(s) does it run on?

I&U: InfoSphere Streams runs on RHEL 4.4 for 32-bit x86 hardware and RHEL 5.2 for 64-bit x86 hardware.

S&M: Are these operating systems standard versions or custom?

I&U: They are standard operating systems.

S&M: What language(s) is it written in?

I&U: InfoSphere Streams is written in C and C++.

S&M: How does a programmer interact with it? Via a normal programming language or some custom language?

I&U: Applications for InfoSphere Streams are written in a language called SPADE (Stream Processing Application Declarative Engine). Developed by IBM Research, SPADE is a programming language and a compilation infrastructure, specifically built for streaming systems. It is designed to facilitate the programming of large streaming applications, as well as their efficient and effective mapping to a wide variety of target architectures, including clusters, multicore architectures and special processors, such as the Cell processor. The SPADE programming language allows stream processing applications to be written with the finest granularity of operators that is meaningful to the application, and the SPADE compiler appropriately fuses operators and generates a stream processing graph to be run on the Streams Runtime.

[See Listing 1 for a sample of SPADE. Listing 1 is an excerpt from the “IBM Research Report—SPADE Language Specification” by Martin Hirzel, Henrique Andrade, Bugra Gedik, Vibhore Kumar, Giuliano Losa, Robert Soulé and Kun-Lung Wu, at the IBM Research Division, Thomas J. Watson Research Center.]

______________________

Shawn Powers is an Associate Editor for Linux Journal. You might find him chatting on the IRC channel, or Twitter

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState