System Administration of the IBM Watson Supercomputer

System administrators at the USENIX LISA 2011 conference (LISA is a great system administration conference, by the way) in Boston in December got to hear Michael Perrone's presentation "What Is Watson?"

Michael Perrone is the Manager of Multicore Computing from the IBM T.J. Watson Research Center. The entire presentation (slides, video and MP3) is available on the USENIX Web site, and if you really want to understand how Watson works under the hood, take an hour to listen to Michael's talk (and the sysadmin Q&A at the end).

I approached Michael after his talk and asked if there was a sysadmin on his team who would be willing to answer some questions about handling Watson's system administration, and after a brief introduction to Watson, I include our conversation below.

What Is Watson?

In a nutshell, Watson is an impressive demonstration of the current state of the art in artificial intelligence: a computer's ability to answer questions posed in natural language (text or speech) correctly.

Watson came out of the IBM DeepQA Project and is an application of DeepQA tuned specifically to Jeopardy (a US TV trivia game show). The "QA" in DeepQA stands for Question Answering, which means the computer can answer your questions, spoken in a human language (starting with English). The "Deep" in DeepQA means the computer is able to analyze deeply enough to handle natural language text and speech successfully. Because natural language is unstructured, deep analysis is required to interpret it correctly.

It demonstrates (in a popular format) a computer's capability to interface with us using natural language, to "understand" and answer questions correctly by quickly searching a vast sea of data and correctly picking out the vital facts that answer the question.

Watson is thousands of algorithms running on thousands of cores using terabytes of memory, driving teraflops of CPU operations to deliver an answer to a natural language question in less than five seconds. It is an exciting feat of technology, and it's just a taste of what's to come.

IBM's goal for the DeepQA Project is to drive automatic Question Answering technology to a point where it clearly and consistently rivals the best human performance.

Watson's Vital Statistics

  • 90 IBM Power 750 servers (plus additional I/O, network and cluster controller nodes).

  • 80 trillion operations per second (teraflops).

  • Watson's corpus size was 400 terabytes of data—encyclopedias, databases and so on. Watson was disconnected from the Internet. Everything it knows about the world came from the corpus.

  • Average time to handle a question: three seconds.

  • 2880 POWER7 cores (3.555GHz chip), four threads per core.

  • 500GB per sec on-chip bandwidth (between the cores on a chip).

  • 10Gb Ethernet network.

  • 15TB of RAM.

  • 20TB of disk, clustered. (Watson built its semantic Web from the 400TB corpus. It keeps the semantic Web, but not the corpus.)

  • Runs IBM DeepQA software, which has open-source components: Apache Hadoop distributed filesystem and Apache UIMA for natural language processing.

  • SUSE Linux.

  • One full-time sysadmin on staff.

  • Ten compute racks, 80kW of power, 20 tons of cooling (for comparison, a human has one brain, which fits in a shoebox, can run on a tuna-fish sandwich and can be cooled with a handheld paper fan).

How Does Watson Work?

First, Watson develops a semantic net. Watson takes a large volume of text (the corpus) and parses that with natural language processing to create "syntatic frames" (subject→verb→object). It then uses syntactic frames to create "semantic frames", which have a degree of probability. Here's an example of semantic frames:

  • Inventors patent inventions (.8).

  • Fluid is a liquid (.6).

  • Liquid is a fluid (.5).

Why isn't the probability 1 in any of these examples? Because of phrases like "I speak English fluently". They tend to skew the numbers.

To answer questions, Watson uses Massively Parallel Probabilistic Evidence-Based Architecture. It uses the evidence from its semantic net to analyze the hypotheses it builds up to answer the question. You should watch the video of Michael's presentation and look at the slides, as there is really too much under the hood to present in a short article, but in a nutshell, Watson develops huge amounts of hypotheses (potential answers) and uses evidence from its semantic Web to assign probabilities to the answers to pick the most likely answer.

There are many algorithms at play in Watson. Watson even can learn from its mistakes and change its Jeopardy strategy.

Watson Is Built on Open Source

Watson is built on the Apache UIMA framework, uses Apache Hadoop, runs on Linux, and uses xCAT and Ganglia for configuration management and monitoring—all open-source tools.

Interview with Eddie Epstein on System Administration of the Watson Supercomputer

Eddie Epstein is the IBM researcher responsible for scaling out Watson's computation over thousands of compute cores in order to achieve the speed needed to be competitive in a live Jeopardy game. For the past seven years, Eddie managed the IBM team doing ongoing development of Apache UIMA. Eddie was kind enough to answer my questions about system administration of the Watson cluster.

AT: Why did you decide to use Linux?

EE: The project started with x86-based blades, and the researchers responsible for admin were very familiar with Linux.

AT: What configuration management tools did you use? How did you handle updating the Watson software on thousands of Linux servers?

EE: We had only hundreds of servers. The servers ranged from 4- to 32-core machines. We started with CSM to manage OS installs, then switched to xCat.

______________________

Aleksey Tsalolikhin has been a UNIX/Linux system administrator for 14 years.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Syntatic frames make my head hurt!

GregD44's picture

Very interesting article. "Syntatic frames" are a bit beyond my understanding, but still interesting.

Thanks for sharing

spc's picture

Thanks for sharing the information, It's very informative and helpful. and serasa

System Administration of the IBM Watson Supercomputer | Linux

Zoe's picture

This blog genuinely will need my comment : this internet site is good !
Please take a look at it and enjoy it you may discover it certainly exciting!

Reply to comment | Linux Journal

google's picture

Hi just wanted to give you a quick heads up and let you know a
few of the pictures aren't loading correctly. I'm not sure why but I
think its a linking issue. I've tried it in two different web browsers and both show the same outcome.

Reply to comment | Linux Journal

{Porsche dealers|Porsche dealer|911 porsche|preowned 's picture

Good post. I learn something totally new and challenging on
sites I stumbleupon on a daily basis. It's always helpful to read through content from other writers and practice a little something from their web sites.

Reply to comment | Linux Journal

see through toaster's picture

I'm impressed, I have to admit. Rarely do I come across a blog that's equally educative and amusing,
and let me tell you, you have hit the nail on the head.
The problem is something that too few folks are speaking intelligently about.
Now i'm very happy I came across this in my search for something relating to this.

Reply to comment | Linux Journal

civil engineer's picture

Great post however , I was wanting to know if you could write a litte more
on this subject? I'd be very thankful if you could elaborate a little bit further. Cheers!

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState