I'm Not Going to Pay a Lot for This Supercomputer!
Because of its use in astrophysics, Loki has been used to compute results for an N-body, gravitational-interaction problem using a parallelized hashed oct-tree library. (oct-tree is a three-dimensional tree data structure, where each cubical cell is recursively divided into eight daughter cells. treecode is a numerical algorithm which uses tree data structures to increase the efficiency of N-body simulations. For details on the treecode, see the URL listed in Resources.) The code is not machine-specific, so comparing the performance of the commodity machines to traditional supercomputers is free of porting issues (with the exception that the Intel i860 and the Thinking Machines CM-5 have an inner loop coded in assembly).
At Supercomputing '96, Loki and Hyglac were connected via $3,000 of additional Ethernet cards and cables) to perform as a single 32-node machine with a purchase cost of just over $100,000. Running the N-body benchmark calculation with 10 million particles, Loki+Hyglac achieved 2.19 billion floating-point operations per second (GFLOPS), more than doubling the per-processor performance of a Cray T3D and almost matching that of an IBM SP-2 (see Table 1).
Figure 4. Intermediate stage of a gravitation N-body simulation of galaxy formation using 9.75 million particles. It took about three days to compute on Loki at a sustained rate of about 1GFLOP.
As a stand-alone machine at LANL, Loki has performed an N-body calculation with just over 9.75 million particles. This calculation was “real work” and not “proof-of-principle”, so it was tuned to optimize scientific results rather than machine performance. Even with that condition, the performance and results are striking. The total simulation required 10 days (less a few hours) to step through 750 time steps, performed 6.6x1014 floating-point operations to compute 1.97x1013 particle interactions and produced just over 10GB of output data.
For the entire simulation, Loki achieved an average of 879MFLOPS, yielding a price/performance figure of $58/MFLOP. Contemporary machines such as SGI's Origin are capable of price/performance in this range, but scaling an Origin to the memory and disk necessary to perform a calculation of this magnitude quickly becomes prohibitive; at list price, 2GB of Origin's memory alone costs more than the entire Loki assembly.
The nature of the treecode is such that later time steps have greater overhead in spanning the tree than in performing floating-point arithmetic, so the average flop rate steadily decreases the longer the code is run. When the first 30 time steps of the simulation are taken into consideration, 1.15x1012 particle interactions in 10.25 hours provide a throughput of 1.19 GFLOPS. This figure actually is a better estimate of the amount of useful work than that given for the total simulation, since the treecode's purpose is to avoid floating-point calculations whenever possible.
Loki has also been used to simulate the fusion of two vortex rings. The simulation began with 57,000 vortex particles in two discrete smoke rings, though re-meshing caused the simulation to be tracking 360,000 particles by the final time step. Each processor sustained just over 65 MFLOPS during the simulation for a total system performance of 950 MFLOPS.
Hyglac has been used to perform photo-realistic rendering using a Monte Carlo implementation of the rendering equation. Images of some of the rendered images are available at http://www.cacr.caltech.edu/research/beowulf/rendering.html. In a direct comparison with an IBM SP-2, Hyglac completed the renderings anywhere from 12% to 20% faster than an IBM SP-2, a machine with a price tag twenty times that of Hyglac.
Even the most blazingly fast system is useless if it can't perform without crashing. System reliability is therefore crucial, especially in the case of a machine like Loki which may need several days without interruption to complete a large-scale calculation. During the burn-in period, a bad SIMM and a handful of bad hard drives were replaced under their warranty terms. The warranties on commodity parts make these commodity supercomputers particularly appealing. Warranties on specialty machines like the Origin tend to be 90 days or less, whereas readily available parts such as Loki's innards generally have warranties ranging from a year to life. In September 1997, most of the Loki nodes had uptimes of over 4 months without a reboot. The only hardware problems encountered have been three ATX power supply fans which failed, resulting in node shutdowns due to overheating. Those nodes were easily swapped with a spare, and the fans replaced in a few minutes.
Practical Task Scheduling Deployment
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.View Now!
|The Firebird Project's Firebird Relational Database||Jul 29, 2016|
|Stunnel Security for Oracle||Jul 28, 2016|
|SUSE LLC's SUSE Manager||Jul 21, 2016|
|My +1 Sword of Productivity||Jul 20, 2016|
|Non-Linux FOSS: Caffeine!||Jul 19, 2016|
|Murat Yener and Onur Dundar's Expert Android Studio (Wrox)||Jul 18, 2016|
- Stunnel Security for Oracle
- The Firebird Project's Firebird Relational Database
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SUSE LLC's SUSE Manager
- Managing Linux Using Puppet
- My +1 Sword of Productivity
- Non-Linux FOSS: Caffeine!
- Doing for User Space What We Did for Kernel Space
- Google's SwiftShader Released
- SuperTuxKart 0.9.2 Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide