ROOT: An Object-Oriented Data Analysis Framework
ROOT is a system for large scale data analysis and data mining. It is being developed for the analysis of Particle Physics data, but can be equally well used in other fields where large amounts of data need to be processed.
After many years of experience in developing interactive data analysis systems like PAW and PIAF (see Resources), we realized that the growth and maintainability of these products, written in FORTRAN and using 20-year-old libraries, had reached its limits. Although still popular in the physics community, these systems do not scale up to the challenges offered by the next generation particle accelerator, the Large Hadron Collider (LHC), currently under construction at CERN, in Geneva, Switzerland. The expected amount of data produced by the LHC will be on the order of several petabytes (1PB = 1,000,000GB) per year. This is two to three orders of magnitude more than what is being produced by the current generation of accelerators.
Therefore, in early 1995, Rene Brun and I started developing a system, intending to overcome the deficiencies of these previous programs. One of the first decisions we made was to follow the object-oriented analysis and design methodology and to use C++ as our implementation language. Although all of our previous programming experience was in FORTRAN, we soon realized the power of OO and C++, and after some initial “throw-away” prototyping, the ROOT system began to take shape.
In November 1995, we gave the first public presentation of ROOT at CERN and, at the same time, version 0.5 was released via the Web. By then, Nenad Buncic and Valery Fine had joined our team.
Since the initial release, there has been a constantly increasing number of users. In response to comments and feedback, we've been regularly releasing new versions containing bug fixes and new features. In January 1997, version 1.0 was released and in March 1998 version 2.0. Since the release of version 1.0, more than 9,300 copies of the ROOT binaries have been downloaded from our web site, about 500 people have registered as ROOT users, and the web site gets up to 100,000 hits per month.
ROOT is currently being used in many different fields such as physics, astronomy, biology, genetics, finance, insurance, pharmaceuticals, etc.
The source and binaries for many different platforms can be downloaded from the ROOT web site (http://root.cern.ch/). The current version can be used and distributed freely as long as proper credit is given and copyright notices are maintained. For commercial use, the authors would like to be notified.
The main components of the ROOT system are:
A hierarchical object-oriented database (machine independent, highly compressed, supporting schema evolution and object versioning)
A C++ interpreter
Advanced statistical analysis tools (classes for multi-dimensional histogramming, fitting and minimization)
Visualization tools (classes for 2D and 3D graphics including an OpenGL interface)
A rich set of container classes that are fully I/O aware (list, sorted list, map, btree, hashtable, object array, etc.)
An extensive set of GUI classes (windows, buttons, combo-box, tabs, menus, item lists, icon box, tool bar, status bar and many others)
An automatic HTML documentation generation facility
Run-time object inspection capabilities
Client/server networking classes
Shared memory support
Remote database access, either via a special daemon or via the Apache web server
Ported to all known UNIX and Linux systems and also to Windows 95 and NT
The complete system consists of about 450,000 lines of C++ and 80,000 lines of C code. There are about 310 classes grouped in 24 different frameworks, each class represented by its own shared library.
One of the key components of the ROOT system is the CINT C/C++ interpreter. CINT, written by Masaharu Goto of Hewlett Packard Japan, covers 95% of ANSI C and about 85% of C++. Template support is being worked on, and exceptions are still missing. CINT is complete enough to be able to interpret its own 70,000 lines of C and to let the interpreted interpreter interpret a small program.
The advantage of a C/C++ interpreter is that it allows for fast prototyping, since it eliminates the typical time consuming edit/compile/link cycle. Once a script or program is finished, you can compile it with a standard C/C++ compiler (gcc) to machine code and enjoy full machine performance. Since CINT is very efficient (for example, for/while loops are byte-code compiled on the fly), it is quite possible to run small programs in the interpreter. In most cases, CINT outperforms other interpreters like Perl and Python.
Existing C and C++ libraries can easily be interfaced to the interpreter. This is done by generating a dictionary from the function and class definitions. The dictionary provides CINT with all necessary information to be able to call functions, create objects and call member functions. A dictionary is easily generated by the program rootcint that uses the library header files as input and produces a C++ file containing the dictionary as output. You compile the dictionary and link it with the library code into a single shared library. At run-time, you dynamically link the shared library, and then you can call the library code via the interpreter. This can be a very convenient way to quickly test some specific library functions. Instead of having to write a small test program, you just call the functions directly from the interpreter prompt.
The CINT interpreter is fully embedded into the ROOT system. It allows the ROOT command line, scripting and programming languages to be identical. The embedded interpreter dictionaries provide the necessary information to automatically create GUI elements like context pop-up menus unique for each class and for the generation of fully hyperized HTML class documentation. Furthermore, the dictionary information provides complete run-time type information (RTTI) and run-time object introspection capabilities.