Linux Helps Bring Titanic to Life
The Linux distribution used was Red Hat 4.1. At that time Red Hat was shipping Linux 2.0.18, which didn't support the PC164 mainboard, so the first thing we had to do was upgrade the kernel. During our testing we tracked down a number of problems with devices and kept up with both the 2.0 and 2.1 series of kernels. We ended up sticking with 2.1.42 with a few patches. We also decided on the NCR 810 SCSI card with the BSD-based driver and the SMC 100MB Ethernet card with the de4x5 driver. It turned out to be a very stable configuration, but there was one serious floating-point problem that caused our water-rendering software to die with an unexpected floating-point exception.
This turned out to be a tricky problem to fix and didn't make it into the kernel sources until 2.0.31-pre5 and 2.1.43. The Alpha kernel contains code to catch floating-point exceptions and to handle them according to the IEEE standard. That code failed to handle one of the floating-point instructions that could generate an exception. As a result, when that case occurred, the application would exit with a floating-point exception. Once fixed, our applications ran quite smoothly on the Alpha systems.
At this point, the decision was made to purchase 160 433MHz DEC Alpha systems from Carrera Computers of Newport Beach, California. Of those 160 machines, 105 of the machines are running Linux, the other 55 are running NT. The machines are connected with 100Mbps Ethernet to each other and to the rest of our facility.
The staff at Carrera was extraordinarily helpful and provided inestimable support for our project. This support began at the factory, with follow-up support through delivery, support and repair.
We created a master disk, which we provided to Carrera, along with a single initialization script that would configure the generic master disk to one of the 160 unique personalities by setting up parameters such as the system name and IP address. Carrera built, configured and burned-in the machine, then logged in as a special user causing the setup script to execute. When the script completed, the machine automatically shut down.
This process made configuring the machines easy for both Carrera and us. When the hosts arrived, we just plugged them in and flipped the switch, and they came up on the network. All 160 machines are housed in a small room at Digital Domain in ten 19 inch racks. They are all connected to a central screen, keyboard and mouse via a switching system to allow an operator to sit in the middle of the room and work on the console of any machine in the room.
Figure 2. Digital Domain Computer Room
The room was assembled in a time period of two weeks including the installation of the electrical, computing and networking. The time spent creating the initialization script was extremely well spent as it allowed the machines to be dropped in place with relatively little trouble. At that point we began running the Titanic work through the “Render Ranch” of Alphas.
The first part of this work partition was to simulate and render the water elements. We knew that the water elements were computationally very expensive, so this process was one of the major reasons for purchasing the Alphas.
These jobs computed for approximately 45 minutes and then generated several hundred megabytes of image data to be stored on central storage servers. Intermediate data was stored on the local SCSI disk of the Alpha. The floating-point power of the DEC Alpha made jobs run about 3.5 times faster than on our old SGI systems.
As the water rendering completed, the task load then switched to compositing. These jobs were more I/O bound, because they had to read elements from disks on servers spread around the facility and combine them into frames to be stored centrally. Even so, we still saw improvements of a factor of two for these tasks.
We were extremely pleased with the results. Between the beginning of June and the end of August, the Alpha Linux systems processed over three hundred thousand frames. The systems were up and running 24 hours a day, seven days a week. There were no extended downtimes, and many of the machines were up for more than a month at a time.
We addressed a number of different problems using a variety of techniques. Some of the problems were Alpha specific, and some were issues for the Linux community at large. Hopefully, these issues will help others in the same position and provide feedback for the Linux community.
Hardware compatibility, particularly with Alpha Linux, is still a problem. Carrera was very cooperative about sending us multiple card varieties, so that we could do extensive testing. The range of choices was large enough that we were able to find a combination that worked. We had to pay careful attention to which products we were using, as the particular chip revision made a difference in one case.
The floating-point problem (discussed above) was the toughest problem we had to address. We didn't expect to find this kind of problem when we started the project. This was a long-standing bug that had never been tracked down—we attribute this fact to the relatively small Alpha Linux community.
Linux software for Alpha seems to be less tested than the equivalent software for the Intel processors—again, a function of the user-base size. It was exacerbated by the fact that Alpha Linux uses glibc instead of libc5, which introduced problems in our code and, we suspect, in other packages.
We had a number of small configuration issues with respect to the size of our facility. Most of these were just parameter changes in the kernel, but they took some effort to track down. For example, we had to increase the number of simultaneously mounted file systems (64 was not sufficient). Also, NFS directory reads were expected to fit within one page (4K on Intel, 8K on Alpha); we had to double this number to support the average number of frames stored in a single directory.
Boot management under Linux Alpha was more difficult than we would have liked. We felt the documentation needed improvements to make it more useful. Boot management required extensive knowledge of ARC, MILO and Linux to make it work. ARC requires entering a reasonably large amount of data to get MILO to boot. MILO worked well and provided a good set of options, but we never managed to get soft reboots to operate correctly. We've been working with the engineers at DEC to improve some of these issues.
The weakest link in the current Linux kernel appeared to be the NFS implementation, resulting in most of our system crashes. We generally had a large number of file systems mounted simultaneously, and those file systems were often under heavy load. When central servers died or had problems, the Linux systems didn't recover. The common symptoms of these problems were stale NFS handles and kernel hangs. When all the servers were running, the Linux boxes worked correctly. Overall, the NFS implementation worked, but it should be more robust.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
| Android's Limits | Jun 04, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Introduction to MapReduce with Hadoop on Linux
- Senior Perl Developer
- Technical Support Rep
- Weechat, Irssi's Little Brother
- UX Designer
- One Tail Just Isn't Enough
- Android's Limits
- Replica Watches
2 hours 9 min ago - Reply to comment | Linux Journal
6 hours 19 min ago - on the path to understanding
6 hours 23 min ago - As a fisher,we know that a
1 day 1 hour ago - All I Say Is Worth Share!
1 day 2 hours ago - GeekSays
1 day 3 hours ago - thanks
1 day 6 hours ago - You should consider visiting
1 day 7 hours ago - You should consider visiting
1 day 7 hours ago - You should consider visiting
1 day 7 hours ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
vps
forever linux.
VPS
I think, linux is still hard for home users. So they need better solutions for them.
great
linux is powerful.
But I just know centos a bit, I have a lots to learn.
Computing Center capacity for Titanic Movie in 1997
This article by Daryll Strauss has so much historic value because today (2008) we laugh about capacities they had in 1997. I don't laugh about the creativity and effectiveness they build up with the equipment of that time but these days you go to another ALDI-store, buy 5 Home Computers and have 5 Terabytes of disk storage available. This is the amount of disk capacity, which Daryll and his team had available to produce this All-Time-Historic-Movie "Titanic" by James Cameron with so much inspiration to computer development. And... LINUX was the major player in the business then!!! Well, what could I say more, I like it...
All the Best from Berlin, Germany - Wolfgang
(By the way, some years ago Linux Journal had a German translation of Daryll's article (mmmhhh... I did it) online. It's gone. Is it still in any of those archives...? ;)
re
+1 Guys !
re
yep useful information.thnx
Re: Linux Helps Bring Titanic to Life
i love tony
Does somebody know where
Does somebody know where Tony is working, in these days.