Physics Analysis Workstation

 in

CERN is the European Laboratory for Particle Physics. It has been in the news quite a bit lately with the discovery of the Higgs Boson at the Large Hadron Collider. Something that many people may not know is that it also has a long tradition of developing software for scientific use. The HTML document format and the first browser both were developed there as a way of using rich documents that could include links between many different sources of information. It was so useful, it ended up sparking the World Wide Web. Along with such widespread software, CERN has been responsible for quite a bit of scientific software, especially physics software.

In this article, I take a look at a fairly large group of modules and libraries called the Physics Analysis Workstation (PAW). PAW contains several thousand subroutines and programs that are written in FORTRAN, C and even some assembly language code, which is built on top of a library called the CERN Program Library (CERNLIB).

You can download and install the code from the source located at the main Web site if you have any special needs, but considering the long list of required external libraries, I suggest you avoid that if possible. Packages should be available for your distribution. For Debian-based distros, you can install everything you need with the command:


sudo apt-get install paw

PAW also includes a large series of graphing and data visualization routines to help in data analysis. Sometimes you need to see what your data looks like in order to figure out what further analysis you need to investigate.

PAW actually is an interactive system, where you can apply commands against your data set. The original interface was a command-line one, but it now has collected several other interfaces that you can try out. If you open a terminal, type the command paw and press Enter, you are presented with a question as to which terminal type you want to use (Figure 1). The default is to use type 1, which opens an HIGZ graphic window where your plots will be displayed (Figure 2). If you are using PAW on a remote machine, you probably will want to use a different type. You can get a list by typing ?. For a regular xterm, enter 7879.

Figure 1. You can select the terminal type to use when you start PAW.

Figure 2. The default is to open a graphics window to draw your plots into, along with a command interface.

Once everything has finished loading, you are presented with a prompt that looks like this:


PAW >

Now you can start typing commands and doing data analysis. But, what commands can you use? Luckily, PAW includes a help system within the program that you can access by typing the help command, which pops up a list of topics.

Commands in PAW are grouped together in a tree structure, with the top-most level being the topics that pop up when you start the help system. There is also quite a bit of documentation available on the main Web site, including tutorials and a very large FAQ.

Because PAW is used for data analysis, let's start with what kinds of data you can use. PAW has three main data types: VECTORS, HISTOGRAMS and NTUPLES. VECTORS store arrays of reals or integers. PAW can handle up to three dimensions, or indexes, for these VECTORS. They can be manipulated by the group of VECTOR commands. Commands in PAW are not case-sensitive, but in most documentation, they are shown in uppercase. You also can use abbreviations for commands, as long as they can be matched uniquely to the full command text. So, you can create a new VECTOR of 20 elements with the command:


VECTOR/CREATE  vec1(20)

This new VECTOR is named "vec1". Then you can add elements to your new vector with this command:


VECTOR/INPUT vec1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

The command takes a vector name and a list of values to add. This is fine if you are dealing with just a small set of data. If you have larger data sets stored in files, you can use the command VECTOR/READ. This command takes a filename, and it also can take several other options, like the format of the elements, and loads the data into the given VECTORS.

The optional format string is similar to those used in reading and writing data in FORTRAN code, so a refresher course may be a good idea if it has been some time since you have used FORTRAN.

You can output data to a file with the inverse VECTOR/WRITE command.

To visualize your data, use the VECTOR/DRAW command. The options available allow you to select whether to draw a histogram, a smooth curve or a bar chart. You also can draw this visualization over the top of another graph.

You can get a list of all of the VECTORS that have been created with the VECTOR/LIST command, and you can clean up unneeded data with the VECTOR/DELETE command.

Once you have loaded your data and taken a look at it, you may have an idea of how the different parts are related to each other. You can use the VECTOR/FIT command to take a function, defined by you with a subroutine, and try to fit the data to it. You also can include a set of associated errors when issuing the command.

The HISTOGRAM group of commands within PAW gives you a larger selection of plotting and analysis tools to apply to your data. The commands are broken down into subgroups that give you commands to create histograms, 2D plots and apply histogram operations to histograms. You can use the GET_VECT and PUT_VECT command subgroups to interact with the VECTOR object that you created above. You also can use FUNCTION commands to create functions that are used in commands that do data fitting, among other areas.

The NTUPLE group of commands are used to manipulate ntuple objects. Ntuples essentially are lists of lists, and you can think of them as matrices. In the PAW documentation, each row is called an event, and each column is called a variable. There are functions to merge data together or make cuts of subsets. Ntuples have their own plot commands that allow you to plot different variables against each other in various forms. If you have lots of data to deal with, you can use the CHAIN command to chain together multiple ntuples to create data sets of essentially unlimited size.

Although PAW is no longer under active development, there still is more than enough really useful code here to keep any scientist busy. If you are doing any work involving data analysis or modeling, especially in C or FORTRAN, it would be well worth your time to do a quick search of the available modules and subroutines in PAW to see if there is anything you can use to make your work progress more quickly. I cover only a very small portion of the functionality available in this article, so be sure to do a bit of a deeper dive to see what you can mine for your own work.

______________________

Joey Bernard has a background in both physics and computer science. This serves him well in his day job as a computational research consultant at the University of New Brunswick. He also teaches computational physics and parallel programming.