Build It And They Will Come

Well, actually, no they won't.

I'm talking about purchasing and installing a brand new Linux cluster in a pure Windows shop and having any expectations that it will be used. Your co-workers will probably look at you funny, and they might stand way over on the other side of the elevator during that ride up to the fourth floor, but don't count on them knocking your door down begging for access to your shiny new Linux resource.

What it takes to get Windows users to move work to a Linux cluster is advocacy, opportunity, and a little luck. In this case the luck part came about via a conversation overheard at lunch about a 10 year old legacy application. The code needed to be run for several days each by four people on their XP boxes. Afterwards the results are manually collated. Rinse, wash, repeat as necessary to cover all the parameter sweep cases. Even luckier: the application was written in C++, albeit using MS Visual Studio.

So we have opportunity and luck, now for the advocacy. Think about it for a bit: a 10 year old Windows application that we now want to run on a 64-bit Linux cluster. Our resources: two Windows computer scientists who know very little about Linux, and me who (and I freely admit this) hates Micro$oft. What could possibly go wrong?

Well, lots, as it turns out. Windows system header files sprinkled all throughout the code. File name case sensitivity, or, rather, the lack of it in Windows. Teaching Windows users how to use scp, bash, g++, VNC, xfce, cluster job control utilities, etc. This project, and any other like it was going to require a strong advocate to keep it moving in the right direction.

As a long-time Linux supporter/developer/evangelist I was happy to be the advocate. Once the code was eventually ported and tested, I wrote a series of scripts that fully automated the process of running the thousands of parameter sweep run cases, and then collated all of the individual output files into a single results file.

Finally, it was time for our first full production run. Voila! In just two hours our little 176-core 64-bit cluster ate up data and spit out the results for 1,500 runs -- previously a task that took 3 - 4 people three days. Yesterday I checked on the cluster and noticed that our new users had recently finished their 4,400th run. I called on them to pass on my congratulations and was told that their P/I was thrilled at the increase in productivity the cluster was providing.

Now that word of this is out we have new application porting activities identified and in the works. A Linux success story!

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Introducing change...

Charles Barnard's picture

Nearly my entire career was spent doing manufacturing software conversions.

The introduction of ANY change to how people do things--even substantial improvements in speed & accuracy, are incredibly difficult to implement.

To me, an OS, like a programming language or a user interface, should be "invisible" - the end user shouldn't have to even know what the OS is on their machine--or if they run things locally or remotely. Ideally, the end-user should neither know nor care about the infrastructure.

Were I to be in the position, I wouldn't bother trying to get users to use the cluster--I'd change the way their machines worked so that they automatically used the cluster when appropriate.

***

There is an old conversion rule that says that in order to get people to change over, you need to remove ALL of their prior options for the task--including the manual records which may be the actual operational documentation

***

I too remember the days when everyone bought IBM, not because they were the best, but because "nobody ever got fired for buying IBM."

In those days, the following was common.

The University wanted to purchase a new computer for student use in classwork.

For various reasons, the person in charge of deciding went with a Univac system.

IBM repeatedly wrangled with him, then his department manager and all the way up to the Governor of the State--all of whom backed the original decision. At each level they were informed by IBM of what a terrible decision had been made which the University would regret later.

Micro$soft Clusters

Not Bob (liar!)'s picture

There seems to be a fair amount of press these days dedicated to accepting and acceptance of MSC based clustering. It seems they are pushing the fact that it is required in order to parallelize MSC applications whether legacy or other.

There is subtle beauty in the model Doug has implemented in his company:

a) Linux is the way to stable results
b) MSC apps can be ported rather easily regardless of the compiler's heritage

Shouldn't we all work to keep MSC at bay regarding this rather large bastion of LINUX supremacy?

M$C

Doug.Roberts's picture

Thanks, Bob. As you probably suspect, I have just one word for Micro$oft Clustering: Yuck.

Heard someone from CERN talk about clusters a few years ago.

Anonymous's picture

They tried to implement a windows cluster once. When they first tried to run the cluster they made a mistake in the program and had to click "OK" on a windows popup on every screen on the cluster, several thousand to get the cluster operational again. Evidently when you divide by zero you get a popup and have to click OK to continue. So they started running a program in the background to just click OK all the time. Eventually they just converted the cluster over to Linux, like the many other clusters at CERN.

Nice however

Cluster Dude's picture

It sounds like this is the sort of application that is run occasionally for days at a time. I really do not understand the reasoning behind buying a cluster (hardware) for a workload such as this. Unless the data is sensitive it would have been one hell of alot cheaper to fire up a bunch of 64 bit nodes on EC2 once in a while to grind the data. I guess now that the app is ported to a real platform that still remains a option.

The cloud is an option

Doug.Roberts's picture

Since the application does not require high bandwidth memory fabric to run (it is a serial application) it could run on Amazon EC. However, as was pointed out in the article and subsequent comments, we already owned the cluster and it frequently has unused cycles, so we chose it for this application. However, we might also migrate the app to the cloud for those days when the cluster is over-subscribed.

As to run length, we actually are only looking at an hour or two to run the 3 - 4 thousand jobs which used to take days on Windows workstations.

--Doug

NIce work

Nick Anderson's picture

Funny, I used to work in HPC. As I was exiting HPC Microsoft was making significant efforts to invade the market. I had a Windows cluster and a Perceus/Warewulf Linux cluster at super computing (I think it was the first public demo of Perceus). The Perceus cluster worked flawlessly, the windows cluster didn't work at all. In fact I brought the windows cluster guys over to the booth and they spent two days at the show trying to get it working with no success.

As far as bringing Linux to windows users good job. After I left clustering I migrated a small office/call center over to Linux workstations (about 40 Linux workstations). By far the most difficult part was getting the "buy in". Upper management buy-in was not difficult but getting the users to embrace the new environment was a challenge. I spent much time training and evangelizing for Linux to help win the users over. After the workstations had been in place for about a month the majority of the users liked the changes. I loved it, support calls plummeted for the Linux users.

Windows

ajboesch71's picture

Failure is not an option, it comes bundled with Windows...

Misc Responses

Doug.Roberts's picture

Thanks for all the feedback everybody! Here are some answers to your questions, in no particular order:

1. Cost/benefit and level of effort to port the code and write the run scripts.

We purchased the cluster for $85,000 last year, and was bought on project funds for another initiative. However, we (in my group) have a strategic plan to "bring the company up to speed" regarding HPC and Linux in general. The cluster was not being fully utilized by its parent project, so we are donating unused cycles as a resource to the rest of the company. The hardware costs associated with this porting effort were therefore $0 as far as the new users were concerned. Time to port the code to Linux: perhaps one person-week. Time to write the run scripts and test them: 3 hours of my time.

2. Strategic planning, cluster operational costs, "upper management" buy-in.

Fortunately, my colleagues and I in the HPC group we founded inside the company are responsible for HPC strategic decisions and directions. The one rule we need to obey to get and retain upper management buy-in is that we produce an HPC business plan that makes sense. We're constantly working on that. Recruiting in-house users is just one component of our strategic planning. Bringing in more outside HPC work is another, of course.

The person who pointed out that there were operational costs associated with running a cluster was completely correct. The IT group in my company charges $10,000 per year per rack to house hardware in their machine room. A bit steep perhaps, but it is one of those facts of life that we have to deal with. At present we are giving any and all new cluster users that we recruit from within the company a free ride. Then, once they are hooked, we will start charging them. See? There are some lessons to be learned from studying Micro$oft's business model...

3. Posix Standards.

Yes! Absolutely. Always!

4. Windows Clusters

Barf.

businesss model

Charles's picture

"Then, once they are hooked, we will start charging them. See? There are some lessons to be learned from studying Micro$oft's business model..."

The model is far, far older than that!

Goes back to the original mating rituals of 'free' samples.

Personally. I'm currently trying to sell a small law firm on Linux-based server-workstation model, rather than the tiny Windoz net they're using.

But even in individual systems, when I offer them either Linux OR Windoz, they act insulted that I'd try and sell them a system with something other than Windoz--and that's just when it's listed as an option, not if I actually am trying to sell them on a linux system.

In some ways, it reminds me of the 70's, when if it wasn't IBM, it was difficult to get a buyer to even look at the system seriously.

Mostly, I make progress by pointing out that they really don't spend much time seeing the OS, they spend all their time in the applications.

Now if the scanner-makers would just standardise their interface and/or provide Linux drivers, I'd be happier.

I would love to move these law firms to 'paperless.'

Good job!

jf3's picture

Nice job in bringing a new platform into your organization and congratulations on the success of your first implementation. But this is obviously not some stealth project where you were sneaking Linux into the data center. Somebody had to sign off on the purchase of that "little 176 core 64-bit Linux cluster" - as well as rack space for 11 nodes. It does seem like a little overkill just to replace 4 desktops.

That investment was obviously part of a strategic decision to move that app, and the ones you are porting now, off the Windows platform. In my experience getting upper management to try anything new (even if it is the right thing) can sometimes be much harder than the implementation. (In my old shop just getting the legal department to sign off on the Linux license from Oracle took months)

isnt it a little(nope terribly) misleading?

prasooncc's picture

i don't take this as a comparison of Linux and Windows Capabilities.It is the comparison of run times in a cluster and 4 pc's.Obviously the cluster must win against 4 odd pc boxes(whether it runs xp,linux,solaris orwhatever.Otherwise you would not get another chance to build one more cluster in the same job !!!!

heh, i guess one could always

turn_self_off's picture

heh, i guess one could always have run a cluster of windows boxes to handle all the parameters, but i wonder what the costs in licenses would have been.

Sour grapes.

Anonymous's picture

Sour grapes.

What's truly wonderful now,

wise0wl's picture

What's truly wonderful now, though, is that if your software engineers re-wrote the application using the POSIX standard libraries (libc) and ANSI, most of the code is probably much more portable BACK to Windows. Nice how that works, isn't it? Coding standards---yum.

cost/benefit

JSF's picture

Great effort Doug but can you give us a clearer idea of the cost/benefit?
How long did it take to port & test the code and write the scripts?

As you mention, the financial

Smithy_'s picture

As you mention, the financial cost isn't mentioned. However, reducing the time cost of "3 - 4 people" over "three days" to just a "two hours" automated process is a pretty big benefit in anyone's books. I imagine any time costs incurred by the "two Windows computer scientists" plus the author are being quickly eaten up and will be well recouped within the year.

But yes, good idea, it would be great to see the new customer quantify the costs vs benefits to really drive it home!

Linux to the rescue

DJ's picture

Another success story about the power of Linux. Way to seize the opportunity and run with it!

Congrats!

Gene Liverman's picture

Congrats on the success of putting Linux to use in a Windows world! It is always refreshing to hear stories like this.

Gene Liverman is a Systems Administrator of *nix and VMware at a university.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix