Resources for “Scientific Visualizations with POV-Ray”



Suzuki's Density File Extension Patch:

Leigh Orf's Research Page:

U. of I. Convective Modeling Group:


mjpeg tools:




Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Multi-Core CPU and UNIX/Linux

Phil the dill's picture

Hi there,

If you don't like reading reaaalllly long posts then maybe just print this one out for a rainy day but if you don't mind reading long posts, I hope you get as much out of reading it as I have writing it because it's lead me to consider certain ideas just in the process of authoring it.

I'm looking at putting together a "cluster" of sorts for improving render times and from my experience there are several ways to tweak things to improve overall speed. The application will be for both 3D animations and for very complex single scene renderings.

I have heard of the Beowulf project for Linux, I believe that this is a generic cluster server to improve overall performance of a system no matter what the situation/application but since 3D render farms are a specific and dedicated application of a cluster and that nuances in software design techniques can make a huge difference at little cost I thought creating my own cluster system is the way to go - to make a render-farm for high-end 3D animation for film, scientific visualisations or single-scene ray-tracing jobs that call for super-photo-realistic production that can't be rendered by a single machine in sufficient time.

I'm not sure whether a multi-threaded-version of POV-Ray (which I believe is just being released in both MS Windows 64 bit and as a compilable source code file for UNIX/POSIX/LINUX) would be significantly better than just writing UNIX shell scripts to farm out render sessions as separate instances of POV-Ray within a single (Unix) shell ( i.e. use the "&" extension to executable scripts or binaries at the command line but here are some tips to speed up rendering):

In my experience, if I run two instances of POV-Ray simultaneously in a 32 bit environment (on a single CPU) I take full of advantage of the CPU. This was noticed under both Windows XP and LINUX. Any extra instances don't max out the CPU anymore. If I only run 1 instance of POV-Ray, it runs at the same speed two instances would run at and is therefore wasteful.

So 2 instances of POV-Ray (single threaded) so far seems optimal for a standard 32 bit CPU.

So even without a multi-threaded version of POV-Ray, using a good Linux Distro that supports multi-processors (and 64 bit) as a rule of thumb you could set up a simple Shell script that acts as an agent to start new instances of the POV-Ray renderer depending on how many scenes you have to render.

Given the above observed limitations, such a shell script would read a configuration file that stores the number of CPU's in the machine being used and based on that launch twice (2 x ) the number of instances of POV-Ray when commencing a new rendering session. Each instance of POV-Ray would be launched from another shell script wrapper which would on completion, execute a command to re-run itself on the next available frame - if one exists - this however, would also require some kind of scheduling system in the equation so that two or more scripts don't try to access the same file to be rendered but "grab a token" (like a token ring network or multi-user database system) as it were to allow it to be the script to render the next frame and also update the value of the next available frame number for whatever script is next ready for another frame so it "knows" which one to work on.

(This assumes you are not using the clock feature in POV-Ray for animations but different static .POV sources. For that situation, another solution requires consideration which I will possibly deal with in my next post.)

Now if you're not doing animations and just a huge render of a single complex scene, you can still use the same methods above but instead of launching new instances of POV-Ray for new frames, launch new instances of POV-Ray for subsections of the screen whether this be a rectangular region or every Nth line of the total scene.

You may ask, " .... but how do I then join up these fragments of images ?"

I recommend using the tools in Net-PBM which can also be run from the command line in a shell script. You can get Net-PBM if you have Linux.

Whatever method is used, as a general rule of thumb, you can expect each processor to work optimally with two instances of POV-Ray (single threaded) working it.

(A multi-threaded version of POV-Ray may well work better, but I am yet to get a multi-core system to test this as I am a poor boy right now. I should get a job as network config guy for 3D animation companies, maybe I'll post this article to a magazine and see what happens ???). Then maybe I could afford a new computer !! LOL !!)

Using the method previously described, you are basically ensuring that all CPU's are working as optimally as they can based on the 2 processes per processor (2:1) ratio we assume for POV-Ray when run as a single thread.

I apologize for not providing examples of such scripts but I will endeavor to do so at a later date once I have a small network set up to simulate an actual cluster. Of course I could do it with just a single CPU machine running two instances of POV-Ray since 2 is enough to test this idea but the more CPU's the better the testing environment.

Now if multi-core is substantially faster, that is the way to go for starters, but obviously you can extend the principal I have used above (using Shell scripts as rendering agents) and create a network of rendering machines that again, based on the number of CPU's for every machine runs a particular number of instances of POV-Ray for each frame or chunk of a frame. The only difference is that each machine now must access the "token" for being next in line for the next available frame for rendering from a server in the network.

If you can configure a hardware system that has multiple motherboards that share a single bus as opposed to using ETHERNET or NULL MODEM this will possibly only be useful if you wish to store the source .POV file on single server for that part of the system to improve access to such files by different machines - but for reconnecting frame regions or copying rendered images back to a server that then reconnects these as an animation, having a shared BUS obviously could me more optimal than a network depending on the application.

To speed things up even more I recommend creating RAM drives that are used to emulate disk drives for each machine but which are obviously faster to read and write from. RAM drives are quicker to access and if you are rendering a large scene that you wish to save the resulting rendered image of, it's quicker to fool POV-Ray (or Linux) into thinking it's saving to an actual permanent storage device for the several writes it performs during a render session - as several writes to the hard drive take up more time than one single write of the entire file once the job is completed. Hope that made sense.

A fast BUS speed with fast RAM and L2 cache with lots of it in your machine(s) will also speed up the process.

And finally, run LINUX/UNIX entirely in a command line environment and disable as many background processes that the OS normally runs in parallel (in the background).

If you disable the rendering of the graphics to the screen this has a significant impact on render times depending on the complexity of the scene (between 10 and 25 %) from memory. This is because the work of the machine to write from the memory to the graphics card eats up CPU cycles.

If you are to use a graphical system at all, have it on only one of the machines if you wish to check the quality of your rendered images/animations.

Now such a system, when it is expanded from a multi-core machine to a network of such machines will obviously work best if the .POV source files are all preloaded to each machines (RAM) disks for access so that the only data that is being sent over the network effectively are the filenames/pathnames of the next frames to be rendered or parameters defining the region of the frame to be rendered next by a particular instance of a script that launches POV-Ray.

At some stage, the rendered files will need to be copied back to a server for reconnecting/compositing (for an animation or composited image) and how one would fit this into the model depends on whether the output needs to be streamed or is to be used at a later time. If streaming is to be used (real-time rendering say on a server side web application) then one might consider a system where several machines share a BUS to allow for very fast transfer of larger files so they can be re-composited quickly in the correct order. If the application does not require real-time rendering then any lag time or latency between rendering and post-rendering processing for the entire project will no doubt be insignificant compared to the time taken to actually render the scene so using fast ethernet or a fibre-optic based switched system would be adequate if you wish to only drink one cup of hot-chocolate whilst waiting for this to complete.

To make such a home-built render-farm secure one would obviously build in house-keeping measures such as log files that for example check the size (dimensions) of the frames rendered to make sure they are indeed fully rendered !! (n.b. A machine may unexpectedly power off half way through a rendering while you are up at the pub numbing your own neurons after a hard day in the laboratory or studio.)

This is where Net-PBM comes in handy. You can determine the dimensions of (the partially-rendered) graphics (image) file with Net-PBM and then use such information to schedule a "repair" or "completion" rendering of a frame that was only partially rendered due to power failure. You can also use Net-PBM to stitch two images together so it serves two purposes in this regard. It also has many command line utilities to do 2-D and 3-D-ish post processing effects so it is a very useful tool to incorporate.

The other trick you can use is to log the render times for given portions of a scene or given frames of an animated scene. Depending on the circumstances, you can estimate whether a rendering will be comparatively long or short based on the render time of regions or frames that are close to and far away from it.

On this basis, if you happen to have a mix of machines with varying CPU power, you may wish to allocate more-quickly-rendered scenes to a slower machine and less-quickly-rendered scenes to a faster machine in the cluster. This theoretically will speed up the overall render-time for the cluster in some situations depending on the number of rendering instances (frames or frame portions) and the variability of render times over these render instances. It obviously is also more useful when there is a great variation in the speed of each given machine in the cluster.

This an important factor to consider when building a cluster if when you start out on a limited budget, you would no doubt be buying older technology and perhaps less of it but as you upgrade your cluster you may wish to include legacy equipment to maximise its overall power.

It's also relevant if you happen to do a lot of re-renderings of entire scenes or animations - if the final few scenes animated happen to be slow renders on slow machines you can be losing valuable minutes which turn into hours and days depending on how many full-renderings you perform throughout the course of the project.

Alternatively, if your render-farm system's logic can preempt the total comparative render times for different jobs it can optimally schedule them to various machines to provide the fastest overall rendering, taking into account other factors such as diminishing numbers of frames to be rendered towards the end of a job.

So for example, if you are rendering a complex animation and scheduling jobs by individual frames and you have say, 24 CPU's (24 separate, networked machines) in your system (with equal power for arguments sake), when you get to the final 24 frames you will have one frame to render per CPU.

Ideally, you would preempt the jobs using some kind of statistics in a log-file so that such renderings commenced roughly at the same time or at times such that all renderings are completed at the same time.

Now the problem we have is that when we get down to 24 frames from the above example, if we start rendering one frame per CPU we could potentially be wasting CPU power if for example one of those frames was unavoidably slower to render than the other 23 frames for example because of some extra degree of complexity.

In this example, let's say for example that this frame is 24 times slower to render than the others. If the system could know this in advance, rather than allocating 24 single scenes to 24 CPU's it would allocate 23 frames to 23 of the CPU's and ~ 1/25th of the more complex frame to the other CPU. It would then allocate the remaining 24/25th of that scene over all the 24 CPU's so that for the final 24 renders over 24 CPU's, they would all finish at a similar time. Obviously if different parts of that final scene had different render times known by previous render times stored in log files, the allocation might vary accordingly.

What has happened in the above example is that the system has dynamically changed the method it has been using to maximise all of its power in the same way jet aircraft fuel systems can redirect fuel to different tanks to keep all four engines running at the same power for the same period or to transfer fuel from a tank of a failed engine to an engine that is still working so the fuel is usable and the aircraft reaches it's destination.

The second application in the above analogy could be used in a render farm when one particular machine does in fact fail. The logging part of the system could detect that a certain system is down (using for example the PING command to see if it is live on the network) and if not could reschedule already scheduled tasks to other machines.

So there would be another script running that does nothing but check log-files while the other scripts are in the rendering part of their job and uses the information in such centrally based log files to generate data that other scripts use to know what their next job is.

For example, say we have a render farm of 50 machines of which 30 are running tasks for the latest render job (project). Another 5 machines are in downtime for maintenance and another 15 are still completing another job from the night before. The current way our render farm is allocating jobs for this project run might be that each machine (we currently have 30 machines with the same CPU power capacity etc. currently allocated by the system for this project) is to render an entire scene for the next 300 scenes (since for argument's sake the system has found 300 scenes of similar expected render time for equivalently powered CPU's), but this scheduling is based on the number of machines currently available as some (other) machines (15) are busy working on another run and some (5) are in down time mode for maintenance. Suddenly some of these machines that were unavailable for the previous period have become available as their own task has been completed or they have be resurrected from downtime. Now these machines may have faster or slower CPU times so the simple method of allocating 300 scenes to 30 available machines may not be optimal going forward based on the new network cluster. Dynamically, our system should in such cases re-evaluate its scheduling plan and see if it can improve allocation of jobs over the existing cluster to optimise the overall time to complete this job and others that get scheduled in the meantime.

The idea is though, in general, to make the system increasingly efficient over the course of a project by it automatically analysing the log-files of previous runs.

Over the longer term, a significant amount of time is saved for the people using such a system if they are performing several large runs on a project that is becoming increasingly more complex as more content is added to a scene such as objects, image maps, light sources and not to mention the amount of frames and the size of frames being increased in the final runs.

How can this be done ? i.e. How can you know how long a rendering takes ?

There are utilities in Unix that can be used to time and date stamp events such as the beginning or end of a process and POV-Ray itself can report such statistics in standard output which can then be parsed with programs such as GREP.

This is particularly useful if you want to know how much time is spent doing preprocessing or parsing and how much time is actually spent rendering.

If you have a lot of frames and a huge amount of time is wasted on the preprocessing/parsing because the source code is not optimised then it is useful to know when this is occurring and for which frames. Such functionality can be incorporated into a script-based system that incorporates a whole suite of freely available command-line tools, whether they be typical Unix commands, programs such as Net-PBM, custom written binaries and scripting commands.

If you wish to see some of the (very basic) stuff I have done in POV-Ray, my website is:

Thanks for those who were patient enough to read this post.

I hope it was useful and I look forward to any comments of a constructive or appreciative nature !!


Phil (the dill.)

Great Article

Dennis's picture

Hi just wanted to say thanks for the artcle about povray
But i noted leigh orf That you have made your own cluster right?
WELL HOW DO YOU DO IT ??????????????
Wouldnt it be a great idea for an upcomming aticle about network rendering with linux because i REALLY would like to se an toturial on this subject .
May be also some more links and articles about Opensource 3D
Im SO exited abou suns looking Glass project
Ps (say it loud) WHAT DO WE WANT ????

Building a cluster

orf's picture


Many vendors will put together a cluster for you (the hardware side). If you are looking to put together a bunch of PCs you have already, this is not too difficult so long as you have a network switch. The software side of things is a bit more involved. Most software does not take advantage of clusters, and you have to specifically write applications that take advantage of the distributed memory architecture.

If you want to do network rendering, you can use Maya but it will cost you big time. MPI-Povray is an offshot of Povray 3.1g which works on clusters but I don't believe development is occuring with it. The whole reason I modified the PoVRay code was because there simply wasn't an out-of-the-box solution for me.

I read somewhere recently (probably slashdot) that some video cards can be directly accessed on a cluster to do rendering instead of just the computer's CPU. I have no experience with this and I believe it is somewhat new technology (an probably requires a lot of pretty spiffy graphics cards).

Leigh Orf

video cards as processers

Anonymous's picture

I read a new hardware article in Popular Science that reminded me of this comment. The Ageia PhysX Processor is a PCI card that is designed to make games run better. It would take the load of calculating the 'PhysX' of the game off of the CPU. While this isnt exactly graphics rendering, it looks like it could be easily adapted to do many other operations. either way, it would'nt be cheap, $250-$300.
also i found this on google,
from what i can tell, these single board computers that hook up to pci could do the same as a very small cluster if you put several on 1 mother board. they may not be very fast but their cheap and pci bus speeds beat ethernet hands down.

How does it compare with what Matlab renders?

Ron-Nov8's picture

Hi, Leigh,

Nice article! It makes me buy the journel. I'm really going to explore it to render my 3D scientific data. However, I'm wondering how this compares with what Matlab renders, besides "free",able-to-customizing.

Thank you.
also thank the editor for choosing such a wonderful topic!


orf's picture


Thanks for the nice comments. This is indeed cool stuff!

Unfortunatley I have no recent experience with Matlab and have never used it to render stuff. PoVRay is very powerful and customizable and if you are doing isosurfaces, I would highly recommend it.

In the scientific world there are a myriad of data formats and there is a very good chance no matter what program you use (free or proprietary) it will involve at least a little bit of coding and/or tweaking.

Leigh Orf