Linux Helps Bring Titanic to Life

Digital Domain uses Linux to create high-tech visual effects for the movies.
Numerology and Fate

The Linux distribution used was Red Hat 4.1. At that time Red Hat was shipping Linux 2.0.18, which didn't support the PC164 mainboard, so the first thing we had to do was upgrade the kernel. During our testing we tracked down a number of problems with devices and kept up with both the 2.0 and 2.1 series of kernels. We ended up sticking with 2.1.42 with a few patches. We also decided on the NCR 810 SCSI card with the BSD-based driver and the SMC 100MB Ethernet card with the de4x5 driver. It turned out to be a very stable configuration, but there was one serious floating-point problem that caused our water-rendering software to die with an unexpected floating-point exception.

This turned out to be a tricky problem to fix and didn't make it into the kernel sources until 2.0.31-pre5 and 2.1.43. The Alpha kernel contains code to catch floating-point exceptions and to handle them according to the IEEE standard. That code failed to handle one of the floating-point instructions that could generate an exception. As a result, when that case occurred, the application would exit with a floating-point exception. Once fixed, our applications ran quite smoothly on the Alpha systems.

The Implementation

At this point, the decision was made to purchase 160 433MHz DEC Alpha systems from Carrera Computers of Newport Beach, California. Of those 160 machines, 105 of the machines are running Linux, the other 55 are running NT. The machines are connected with 100Mbps Ethernet to each other and to the rest of our facility.

The staff at Carrera was extraordinarily helpful and provided inestimable support for our project. This support began at the factory, with follow-up support through delivery, support and repair.

We created a master disk, which we provided to Carrera, along with a single initialization script that would configure the generic master disk to one of the 160 unique personalities by setting up parameters such as the system name and IP address. Carrera built, configured and burned-in the machine, then logged in as a special user causing the setup script to execute. When the script completed, the machine automatically shut down.

This process made configuring the machines easy for both Carrera and us. When the hosts arrived, we just plugged them in and flipped the switch, and they came up on the network. All 160 machines are housed in a small room at Digital Domain in ten 19 inch racks. They are all connected to a central screen, keyboard and mouse via a switching system to allow an operator to sit in the middle of the room and work on the console of any machine in the room.

Figure 2. Digital Domain Computer Room

The room was assembled in a time period of two weeks including the installation of the electrical, computing and networking. The time spent creating the initialization script was extremely well spent as it allowed the machines to be dropped in place with relatively little trouble. At that point we began running the Titanic work through the “Render Ranch” of Alphas.

The first part of this work partition was to simulate and render the water elements. We knew that the water elements were computationally very expensive, so this process was one of the major reasons for purchasing the Alphas.

These jobs computed for approximately 45 minutes and then generated several hundred megabytes of image data to be stored on central storage servers. Intermediate data was stored on the local SCSI disk of the Alpha. The floating-point power of the DEC Alpha made jobs run about 3.5 times faster than on our old SGI systems.

As the water rendering completed, the task load then switched to compositing. These jobs were more I/O bound, because they had to read elements from disks on servers spread around the facility and combine them into frames to be stored centrally. Even so, we still saw improvements of a factor of two for these tasks.

We were extremely pleased with the results. Between the beginning of June and the end of August, the Alpha Linux systems processed over three hundred thousand frames. The systems were up and running 24 hours a day, seven days a week. There were no extended downtimes, and many of the machines were up for more than a month at a time.

The Problems

We addressed a number of different problems using a variety of techniques. Some of the problems were Alpha specific, and some were issues for the Linux community at large. Hopefully, these issues will help others in the same position and provide feedback for the Linux community.

Hardware compatibility, particularly with Alpha Linux, is still a problem. Carrera was very cooperative about sending us multiple card varieties, so that we could do extensive testing. The range of choices was large enough that we were able to find a combination that worked. We had to pay careful attention to which products we were using, as the particular chip revision made a difference in one case.

The floating-point problem (discussed above) was the toughest problem we had to address. We didn't expect to find this kind of problem when we started the project. This was a long-standing bug that had never been tracked down—we attribute this fact to the relatively small Alpha Linux community.

Linux software for Alpha seems to be less tested than the equivalent software for the Intel processors—again, a function of the user-base size. It was exacerbated by the fact that Alpha Linux uses glibc instead of libc5, which introduced problems in our code and, we suspect, in other packages.

We had a number of small configuration issues with respect to the size of our facility. Most of these were just parameter changes in the kernel, but they took some effort to track down. For example, we had to increase the number of simultaneously mounted file systems (64 was not sufficient). Also, NFS directory reads were expected to fit within one page (4K on Intel, 8K on Alpha); we had to double this number to support the average number of frames stored in a single directory.

Boot management under Linux Alpha was more difficult than we would have liked. We felt the documentation needed improvements to make it more useful. Boot management required extensive knowledge of ARC, MILO and Linux to make it work. ARC requires entering a reasonably large amount of data to get MILO to boot. MILO worked well and provided a good set of options, but we never managed to get soft reboots to operate correctly. We've been working with the engineers at DEC to improve some of these issues.

The weakest link in the current Linux kernel appeared to be the NFS implementation, resulting in most of our system crashes. We generally had a large number of file systems mounted simultaneously, and those file systems were often under heavy load. When central servers died or had problems, the Linux systems didn't recover. The common symptoms of these problems were stale NFS handles and kernel hangs. When all the servers were running, the Linux boxes worked correctly. Overall, the NFS implementation worked, but it should be more robust.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

vps

Anonymous's picture

forever linux.

VPS

vps's picture

I think, linux is still hard for home users. So they need better solutions for them.

great

software for cellphone's picture

linux is powerful.
But I just know centos a bit, I have a lots to learn.

Computing Center capacity for Titanic Movie in 1997

Wolfgang Pietsch's picture

This article by Daryll Strauss has so much historic value because today (2008) we laugh about capacities they had in 1997. I don't laugh about the creativity and effectiveness they build up with the equipment of that time but these days you go to another ALDI-store, buy 5 Home Computers and have 5 Terabytes of disk storage available. This is the amount of disk capacity, which Daryll and his team had available to produce this All-Time-Historic-Movie "Titanic" by James Cameron with so much inspiration to computer development. And... LINUX was the major player in the business then!!! Well, what could I say more, I like it...

All the Best from Berlin, Germany - Wolfgang

(By the way, some years ago Linux Journal had a German translation of Daryll's article (mmmhhh... I did it) online. It's gone. Is it still in any of those archives...? ;)

re

Medikal's picture

+1 Guys !

re

fragman's picture

yep useful information.thnx

Re: Linux Helps Bring Titanic to Life

Anonymous's picture

i love tony

Does somebody know where

Marco Basali's picture

Does somebody know where Tony is working, in these days.

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState