Build It And They Will Come

by Doug.Roberts

Well, actually, no they won't.

I'm talking about purchasing and installing a brand new Linux cluster in a pure Windows shop and having any expectations that it will be used. Your co-workers will probably look at you funny, and they might stand way over on the other side of the elevator during that ride up to the fourth floor, but don't count on them knocking your door down begging for access to your shiny new Linux resource.

What it takes to get Windows users to move work to a Linux cluster is advocacy, opportunity, and a little luck. In this case the luck part came about via a conversation overheard at lunch about a 10 year old legacy application. The code needed to be run for several days each by four people on their XP boxes. Afterwards the results are manually collated. Rinse, wash, repeat as necessary to cover all the parameter sweep cases. Even luckier: the application was written in C++, albeit using MS Visual Studio.

So we have opportunity and luck, now for the advocacy. Think about it for a bit: a 10 year old Windows application that we now want to run on a 64-bit Linux cluster. Our resources: two Windows computer scientists who know very little about Linux, and me who (and I freely admit this) hates Micro$oft. What could possibly go wrong?

Well, lots, as it turns out. Windows system header files sprinkled all throughout the code. File name case sensitivity, or, rather, the lack of it in Windows. Teaching Windows users how to use scp, bash, g++, VNC, xfce, cluster job control utilities, etc. This project, and any other like it was going to require a strong advocate to keep it moving in the right direction.

As a long-time Linux supporter/developer/evangelist I was happy to be the advocate. Once the code was eventually ported and tested, I wrote a series of scripts that fully automated the process of running the thousands of parameter sweep run cases, and then collated all of the individual output files into a single results file.

Finally, it was time for our first full production run. Voila! In just two hours our little 176-core 64-bit cluster ate up data and spit out the results for 1,500 runs -- previously a task that took 3 - 4 people three days. Yesterday I checked on the cluster and noticed that our new users had recently finished their 4,400th run. I called on them to pass on my congratulations and was told that their P/I was thrilled at the increase in productivity the cluster was providing.

Now that word of this is out we have new application porting activities identified and in the works. A Linux success story!