The Power of the Incredible Hulk—the ILM Linux Death Star
The Star Wars Death Star exists! But, not as a menacing planet-killing weapon. ILM's Death Star renderfarm is the computing power behind the many motion pictures produced at Industrial Light & Magic. “Linux will just continue to grow”, says ILM CTO Cliff Plumer. “Our renderfarm has over 1,500 processors currently, and almost 1,000 more are added every evening with desktops.” The renderfarm utilizes both dedicated CPUs and the computing power of idle desktops.
“Our core renderfarm is comprised almost entirely of Linux boxes since we switched over from SGI machines”, says ILM Systems Developer Mike Thompson. “We have about 750 nodes—1,500 CPUs.” The renderfarm exists as a row of computing towers made up of 1U rackmount dual-processor PCs. However, this isn't a supercomputer in the classic sense. Each machine operates semi-independently in a grid configuration, not bound together as a supercomputer running a single job. At ILM, a proprietary batch-scheduling program called ObaQ manages the workload across machines.
Figure 2. ILM RackSaver Linux renderfarm has 1,500 processors now, which will double for Star Wars Episode III.
As the raw horsepower of systems increased, so did the demand for electricity and cooling in ILM's machine room. “It's important to reduce power and heat”, says Thompson. “We went with AMD Athlon 1600 CPUs, a low-power variant that can be a little difficult to find.” Each node has 2GB of memory, expandable to 4GB, and each node usually runs two jobs at once to utilize the dual processors, but sometimes this is reduced to a single job for full use of memory.
The AMD-powered nodes are RS-1100 units produced by RackSaver. RackSaver caught ILM's attention at a National Association of Broadcasters (NAB) convention and was given the opportunity to bid on building the renderfarm. RackSaver competes mostly with heavyweights IBM and Dell. RackSaver CEO David Diggers says RackSaver's advantage is servers with double the density of competitors. “We're very strong in this vertical market with sales to ILM, Pixar and Warner Brothers”, he said.
The RackSaver renderfarm servers are connected via 100BASE-TX into a Foundry 8000 switch that aggregates network traffic into a gigabit link into the network core. “Just recently we added a 10-gigabit link into our core which helped a lot”, says Thompson. “We have a file server core and 2,500 rendering cores with total aggregate traffic of about 70TB a day.”
“The Hulk is not a typical comic book movie”, says Technical Director Doug Sutton. “It is some of the most challenging work we've done in years. Making a 15-foot-tall green guy look real is an incredible challenge!” The Hulk is a computer-generated (CG) digital actor with emotions and complex green skin.
“Film is designed to make people not look green, to push green away”, says Principal Software Engineer Rod Bogart. “We're using Kodak Premiere print stock that has deeper color than Kodak Vision. If the character is green, as with Hulk, it's hard to make it look like green skin. The green dinosaurs of Jurassic Park are not as hard. Dinosaurs don't have the same sort of skin highlights as people.” The audience is less forgiving about human faces—even green ones. “Green passes through yellow as it goes to white”, adds Sutton. “You need to see that on a monitor in order to counteract or accept it.”
Live action was shot in the streets of San Francisco, California. Jennifer Connelly would be in the middle of a street acting to a big green guy who isn't there. A grip holding a pole with green head on top would be her only cue. “They did a few really cool practical effects, explosions, but mainly it is CG”, says Sutton, “a combination of Maya and SOFTIMAGE”.
Capturing film images demands higher dynamic range than the typical JPEG or PNG supports. Kodak Cineon has long been the standard for digitized film. Cineon is a 10-bit logarithmic format. Compared to JPEG, which is 8-bit linear (that is, 24-bit RGB), a Cineon image has more dynamic range and is especially rich in colors near black. As computing power has increased, most computation has switched to floating point, and working in 10-bit log has become a limitation. OpenEXR is a new floating-point image format.
“The OpenEXR file format is a better digital representation of film because it has a dynamic range of over 30 f-stops without loss of precision”, says Plumer. “Previous 8-bit file formats have the dynamic range of only around seven to ten f-stops and cannot accurately reproduce images with extreme contrast.” ILM created the EXR format in the summer of 2000 and has used it on Harry Potter and the Sorcerer's Stone, Men in Black II, Gangs of New York, Signs, Dreamcatcher, The Hulk, Van Helsing, Peter Pan, Timeline and Pirates of the Caribbean.
OpenEXR uses lossless compression like PNG, not lossy compression like JPEG. Actually, there is an unused Piz12 lossy compression option. Unlike PNG or JPEG, OpenEXR uses wavelet encoding—basically a tree structure containing the signed differences between pixels. Because the magnitude of the numbers is smaller and there are fewer unique values, Huffman encoding can compress that more efficiently. EXR supports 32-bit float, 32-bit integer and 16-bit float to any number of channels. Channels can be different depths; for instance, RGBAZ images need more precision in Z depth—with 16-16-16-16-32 typical. The Z channel is physical depth, not a color or alpha mask—think of Z sort of like sonar. In January 2003, ILM released open-source OpenEXR. The first open-source application to support it was CinePaint.
CinePaint, which until recently was called Film Gimp, is a frame-by-frame motion picture retouching system that branched from GIMP in 1998. I became the CinePaint project leader serendipitously after I wrote some articles about Film Gimp for Linux Journal. In addition to the usual still-image formats, CinePaint supports file formats popular in the motion picture industry. Those formats include Cineon, RnH 16-bit float (a format created by studio Rhythm & Hues that chops off half of a 32-bit float), Radiance HDR, LogLuv TIFF and now OpenEXR. ILM wrote the OpenEXR plugin for CinePaint.
“We never want to look at an image without an appropriate SDev”, says Sutton. “It's important that what you see is what you get. An SDev—a simulation device—is how we make monitors look like film. Since switching to OpenEXR (more on that in a moment) we don't use LUTs much anymore, but we use SDevs all the time.” A LUT is a LookUp Table used to adjust an image to correct gamma. Monitor brightness is not proportional to the input voltage, but rather to the input voltage raised to a power. This exponent is called gamma and varies depending on the display. Macs are usually about 1.8 and PCs about 2.2.
“The way LUTs work is mainly to change contrast”, says Bogart. “What a LUT can't do is increase or decrease saturation.” Instead of LUTs, a more complex lattice computation is used at ILM. Think of a lattice as a 13 × 13 × 13 cube in space—a 3-D indexed array. An odd number is used so the center is gray. In lattice, each RGB value is mapped—not like a LUT that maps per channel. Using three independent lookup tables is not sufficient for getting the look of film—especially with saturated green. The lattice adjusts to a 12-point film curve using a calculated table with 64k entries. An index into the lattice between 0 and 1 returns three values using trilinear or tetrahedral interpolation that are then gamma-corrected. The process is slower, because each pixel must be handled together, but more accurate than the typical RGB channels-based lookup. “Lattices are not just a Hulk thing”, points out Bogart. “For Minority Report that was a bleach process print—very desaturated. We simulated that look with lattices. You can't do desaturation with LUT either.”
The raw 16-bit OpenEXR data format is called Half, as in half of a 32-bit floating-point number. The Half data format is an internal format of NVIDIA graphics cards. It would be nice if the lattice calculation, which consumes CPU cycles, could instead be run directly on the graphics processing unit (GPU) on the graphics card. In fact, that's becoming possible due to advances in graphics cards. “We're looking forward to that”, says Bogart. “We intend to offload image calculations to the GPU running a pixel shader.” NVIDIA offers a new C-like compiler/library called Cg to run bits of pixel code, commonly called shaders, on the GPU. ATI offers a similar technology called High Level Shading Language, and 3Dlabs has OpenGL Shading Language.
GPU programming is something like embedded systems programming, where code is compiled on a host platform then downloaded to the embedded system. GPU programs can be compiled and downloaded to the graphics card at runtime. The compiler is part of the runtime library.
Some 3-D packages, such as SOFTIMAGE and Maya, already are beginning to use Cg to improve rendering performance.
Alias|Wavefront Maya was used for particles and some of the character animation models. Pixar RenderMan and Mental Images Mental Ray software were used for rendering. Raytracing renders reflective surfaces better but takes longer. Raytracing is becoming more practical, thanks to the faster, cheaper Linux systems. Both RenderMan and Mental Ray support shader programming to give images a custom look. RenderMan provides its own shader language, which is considered easy to learn. Mental Ray uses C, which is considered more challenging but more powerful. Which software to use is decided on a scene-by-scene basis. Each scene is rendered under the control of a batch scheduler.
“Florian Kines, who was also behind OpenEXR, wrote our batch scheduler along with a couple others a long time ago for SGI Irix”, says Hess. “That made use of big iron and desktops. When we started our move to Linux we wanted better resource management.” The first version of ObaQ divided machines by show—not a very efficient utilization of resources.
“Our attempt to replace ObaQ with a centralized resource management system called the IMP Project didn't work out”, says Hess. “We went back to ObaQ, and the Linux port of that took about two weeks. Three or four months ago, Florian decided he was going to fix that so any show could use any machine.” ObaQ is a peer-to-peer (P2P) scheduler system. The advantage of a P2P scheduler is that a scheduler server failure won't knock the entire system off-line. ObaQ2 uses a single machine for global scheduling, but it advises only independent machines running ObaQ. Losing the ObaQ2 server won't bring the entire facility down. There are scheduler system alternatives, such as the popular proprietary product Platform LSF or the open-source Condor and OpenPBS schedulers, but ILM plans to continue to use ObaQ.
SGI had added functionality in the IRIX kernel for process monitoring, such as CPU time and temp space. Those values determine how ILM machine time gets charged back by central accounting to projects. ILM discovered the Linux /proc filesystem didn't provide all those statistics or created excessive overhead, and that it couldn't support ObaQ without changes.
“Florian asked me to address some of the Linux kernel issues”, says Hess. “For one thing, Linux provides no way to tell if something is a thread or a process. In ps, every thread shows as a separate process.” Some jobs, such as Mental Ray, can run multiple threads per frame in parallel. Linux top or ps shows each thread using 1GB RAM, but that's shared memory being counted twice. Linux also couldn't tell which job is opening temporary files. ObaQ needs to know that in order to clean up temporary files if it kills a job.
Hess created a Linux kernel module to trap opens, forks, clones, vforks, exits and renames, to make accurate statistics possible. The kernel module does most of work, but the hooked calls should ignore any job not being run by ObaQ. To do that required hacking the kernel. “I used one of the unused bits in the ptrace flag”, says Hess. “Every x86 job has a 32-bit ptrace vector. As of 2.4.20, 10 bits are used to indicate ptrace modes, such as single step. Sometime last year Linux or glibc changed how the ptrace flag works so it clears on fork. I found all places the kernel clears those bits and keep bit 32.” Hess says the OPROFILE feature in the 2.5 kernel has enhanced accounting facilities, so his hack might not be needed in 2.5. Commandeering an unused bit in the ptrace flag was a quick hack to mark jobs as being ObaQ tasks. “This is one of the great things about Linux”, says Hess. “Because we had the source, we could make this change ourselves, and very quickly. No third-party vendor had to be involved to do custom engineering, as in the IRIX case.”
“Now that we have all this firepower in the renderfarm it can overwhelm any file server”, says Thompson. “In The Hulk we have these nuclear explosion renders that are really crunchy—causing major grief for us lately. It is easy for an artist to proc-up a render [add more processors to a task] to the point that it brings a file server to its knees. We're doling out 700 times the data we used to!”
ILM uses a Sun T3 disk array to serve NFS. Adopting Linux as an NFS client presented a number of problems when brought on-line a year and a half ago. Due to a Linux NFS UDP-packets-out-of-order bug (fixed in 2.4.18), after a couple hours the Sun Solaris server would spin up to 100% and be dragged down. Sun came to the rescue with a proprietary Solaris kernel module and IP stack patch to work around the Linux bug.
A nagging issue from choosing Linux NFS UDP is no flow control. “When we get into hot spot problems on file servers, the renderfarm makes a denial-of-service attack on our file servers”, says Thompson. “We're going to try TCP NFS on a Linux client again, now that it's a year and a half later. We'll start testing that next week.” TCP adds about 5% overhead.
ILM is scaling up, from about 20TB of file server storage now to double that next year. “For Star Wars Episode III we're going to double the size of our renderfarm”, says Thompson. “We can do that by ordering another 3,000 nodes from RackSaver—but that could destroy our file servers.” Thompson plans to head off that NFS server meltdown by going to a clustered file server—Sistina GFS or something like that. File serving isn't limited to only within the ILM facility.
ILM ships dailies worldwide over the ILM Conduit, a proprietary file-transfer system that uses an encrypted-SSL transport. Everything is doubly encrypted with Blowfish. ILM has playback software for Windows, Macintosh and a Web-based Java applet version that works everywhere. “MJPEG-A QuickTime is our core movie container format”, says Thompson, “but Conduit can carry anything—match-move data, digital pictures, dailies. People can play back dailies on Linux desktops across regular network connections. That's pretty impressive. It used to be you could do that only on SGI equipment”. For dailies, ILM has 20TB in EMC Clarrion FC4700 arrays, fronted by 4-proc Sun E420R servers with 4GB memory and gigabit Ethernet. “Shot disks” are arranged in quarter-terabyte chunks of storage.
“Linux means having incredible amounts of processing power to solve any problem”, says Sutton. “In The Hulk, we had I don't know how many layers of textures for skin and hair. We can do incredibly complex scenes using Linux.” What movies a studio makes is influenced by cost and schedule. Faster, cheaper Linux means more movies.
Jimmy Perry ([email protected]), marketing coordinator of RackSaver, Inc., for his conception and aid in the development of this feature.
Robin Rowe ([email protected]) is a partner in the motion picture technology company MovieEditor.com, the release manager of Film Gimp and the leader of LinuxMovies.org and OpenSourceProgrammers.org.