ROOT: An Object-Oriented Data Analysis Framework

A report on a data analysis tool currently being developed at CERN.
Installation

The binaries and sources of ROOT can be downloaded from http://root.cern.ch/root/Version200.html. After downloading, uncompress and unarchive (using tar) the file root_v2.00.Linux.2.0.33.tar.gz in your home directory (or in a system-wide location such as /opt). This procedure will produce the directory /root. This directory contains the following files and subdirectories:

  • AA_README: read this file before starting

  • bin: directory containing executables

  • include: directory containing the ROOT header files

  • lib: directory containing the ROOT libraries (in shared library format)

  • macros: directory containing system macros (e.g., GL.C to load OpenGL libs)

  • icons: directory containing xpm icons

  • test: some ROOT test programs

  • tutorials: example macros that can be executed by the bin/root module

Before using the system, you must set the environment variable ROOTSYS to the root directory, e.g., export ROOTSYS=/home/rdm/root, and you must add $ROOTSYS/bin to your path. Once done, you are all set to start rooting.

First Interactive Session

In this first session, start the ROOT interactive program root. This program gives access via a command-line prompt to all available ROOT classes. By typing C++ statements at the prompt, you can create objects, call functions, execute scripts, etc. Go to the directory $ROOTSYS/tutorials and type:

bash$ root
root [0] 1+sqrt(9)
(double)4.000000000000e+00
root [1] for (int i = 0; i < 5; i++)<\n>
printf("Hello %d\n", i)
Hello 0
Hello 1
Hello 2
Hello 3
Hello 4
root [2] .q

As you can see, if you know C or C++, you can use ROOT. No new command-line or scripting language to learn. To exit, use .q, which is one of the few “raw” interpreter commands. The dot is the interpreter escape symbol. There are also some dot commands to debug scripts (step, step over, set breakpoint, etc.) or to load and execute scripts.

Let's now try something more interesting. Again, start root:

bash$ root
root [0] TF1 f1("func1", "sin(x)/x", 0, 10)
root [1] f1.Draw()
root [2] f1.Dump()
root [3] f1.Inspect()
 // Select File/Close Canvas
root [4] .q

Figure 1. Output of f1.Draw()

Here you create an object of class TF1, a one-dimensional function. In the constructor, you specify a name for the object (which is used if the object is stored in a database), the function and the upper and lower value of x. After having created the function object you can, for example, draw the object by executing the TF1::Draw member function. Figure 1 shows how this function looks. Now, move the mouse over the picture and see how the shape of the cursor changes whenever you cross an object. At any point, you can press the right mouse button to pop-up a context menu showing the available member functions for the current object. For example, move the cursor over the function so that it becomes a pointing finger, and then press the right button. The context menu shows the class and name of the object. Select item SetRange and put -10, 10 in the dialog box fields. (This is equivalent to executing the member function f1.SetRange(-10,10) from the command-line prompt, followed by f1.Draw().) Using the Dump member function (that each ROOT class inherits from the basic ROOT class TObject), you can see the complete state of the current object in memory. The Inspect function shows the same information in a graphics window.

Histogramming and Fitting

Let's start root again and run the following two macros:

bash$ root
root [0] .x hsimple.C
root [1] .x ntuple1.C
 // interact with the pictures in the canvas
root [2] .q

Note: if the above doesn't work, make sure you are in the tutorials directory.

Figure 2. Output of ntuple1.C

Macro hsimple.C (see $ROOTSYS/tutorials/hsimple.C) creates some 1D and 2D histograms and an Ntuple object. (An Ntuple is a collection of tuples; a tuple is a set of numbers.) The histograms and Ntuple are filled with random numbers by executing a loop 25,000 times. During the filling, the 1D histogram is drawn in a canvas and updated each 1,000 fills. At the end of the macro, the histogram and Ntuple objects are stored in a ROOT database.

The ntuple1.C macro uses the database created in the previous macro. It creates a canvas object and four graphics pads. In each of the four pads, a distribution of different Ntuple quantities is drawn. Typically, data analysis is done by drawing in a histogram with one of the tuple quantities when some of the other quantities pass a certain condition. For example, our Ntuple contains the quantities px, py, pz, random and i. The command:

ntuple->Draw("px", "pz < 1")

will fill a histogram containing the distribution of the px values for all tuples for which pz < 1. Substitute for the abstract quantities used in this example quantities such as name, sex, age, length, etc., and you can easily understand that Ntuples can be used in many different ways. An Ntuple of 25,000 tuples is quite small. In typical physics analysis situations, Ntuples can contain many millions of tuples. Besides the simple Ntuple, the ROOT system also provides a Tree. A Tree is an Ntuple generalized to complete objects. That is, instead of sets of tuples, a Tree can store sets of objects. The object attributes can be analyzed in the same way as the tuple quantities. For more information on Trees, see the ROOT HOWTOs at http://root.cern.ch/root/Howto.html.

During data analysis, you often need to test the data with a hypothesis. A hypothesis is a theoretical/empirical function that describes a model. To see if the data matches the model, you use minimization techniques to tune the model parameters so that the function best matches the data; this is called fitting. ROOT allows you to fit standard functions like polynomials, Gaussian exponentials or custom defined functions to your data. In the top right pad in Figure 2, the data has been fit with a polynomial of degree two (red curve). This was done by calling the Fit member function of the histogram object:

hprofs->Fit("pol2")

Moving the cursor over the canvas allows you to interact with the different objects. For example, the 3D plot in the lower-right corner can be rotated by clicking the left mouse button and moving the cursor.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Root

Leo Tilson's picture

I am often a little nervous when approaching a software package for the first time. This is exactly the sort of article to help me overcome my fears. From what I hear, ROOT is probably one of the most powerful databases out there - I am impressed for instance by the way in which the data can be highly compressed - but you also seem to have devoted a huge amount of effort into making it as user friendly as possible.

I have been reviewing my use of databases over the last few days because of some concern about MySQL, these concerns being prompted by the legal action being taken by Oracle against Google, and Oracle's reduction in support for OpenSolaris. I had never considered ROOT as an alternative to MySQL, until recently thinking of it as a rather specialised tool. I was pleased to note however that there are bindings for Ruby amongst other languages.

I wonder about your figures for OS shares? I have just installed ROOT, but instead of downloading it from its official site I have installed it from the Debian repository. If many people do this, then the numbers you record for Linux usage may underestimate its Linux usage.

Thank you.

Leo

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState