Linux in a Scientific Laboratory
We have found the scripting-software methodology is very useful for both scientific computing and data acquisition. Scripting is a style of writing software where, instead of constructing a monolithic program with a hardwired control flow, we restructure the code by dividing it into modules which perform parts of the work. To glue these modules together, we compile them with a command-language interpreter.
High-Level Language (HLL) interpreters have been around for a long time (Scheme, Basic, Perl, Tcl, Python), but only recently has there been an emphasis on embedding them within users' programs. Even without such embedding, interpreters are still useful for prototyping, but they tend to run out of steam for larger projects. The key is to put together the flexibility of an interpreted system and the speed and functionality of the compiled HLL code.
For example, let's imagine a program that opens and processes a configuration file, asks the user for input and calculates some results. Traditionally, the control flow of such a program is hardwired in its main routine; each I/O phase is programmed separately with a separate syntax for each phase's data. (The configuration file might be a table of numbers, and the user input might have a form of simple ASCII strings representing commands.)
To rewrite this program in a scripting style, we would recast configuration and calculation phases as separate modules invoked by a scripting interpreter. The data for work modules would be kept in interpreter's variables, while the I/O would be handled by interpreter's native facilities. In order to complete the program, we have to write a short interpreter script that reads the configuration file, stores and processes the values, obtains user input, launches the calculation and outputs the result. The important point, and the one that takes a little while to get used to, is that there is no longer a hardwired control flow in the program: when it is started, the interpreter takes over and awaits the script (either from the command line or from a script file) to set the modules in motion.
Of the several modern scripting languages, we have chosen to use Tcl/Tk. Others, such as Python and Perl, are equally good and have similar capabilities. We have written a significant number of Tcl extensions, dealing with abstractions for platform-independent self-describing data files, binary data matrices and image processing, arithmetic expressions and others.
There are several benefits to the scripting approach. First, it provides for more flexibility: it would be trivial to change the interpreted script to perform two rounds of computations instead of one. Also, it is much easier to decouple the user-interface code from the computational code—all that is needed to add GUI data input is to rewrite the user-interface portions of the script so that it uses the interpreter's GUI widgets.
Second, the interpreter usually provides general-purpose linguistic constructs, such as macros/procedures, and looping and conditional statements. This makes it possible to write sophisticated and flexible batch processing scripts.
Note that a properly designed scriptable application reconciles an artificial and unnecessary distinction between command-line and GUI-based programs. The premise behind graphical user interfaces is to provide visual cues for all operations; however, the tradeoff is often that other operations, for which no GUI element was included, are impossible. In other words, a GUI promises a “What You See is What You Get” operation, but it often delivers “What You Get is What You Get”.
With scripting, the GUI is set up to invoke predefined command lists; at the same time, the interpreter can be directed to accept user-typed commands or file input, allowing for arbitrary command sequences. It is nice to be able to select a file using a file selector dialog, but anyone who has had to negotiate such file selection for a hundred files must appreciate the utility of typing process *.dat on the command line.
The final benefit of an extensible scripting language is that it is well-suited to create abstractions for complex objects or actions. Such abstractions are good for two reasons: they make complex manipulations easier to understand and perform, while at the same time they enable high performance since they are compiled extensions. A good example might be BLT, a Tcl graphing extension we often use. It is a sophisticated graphing tool with dozens of options. Its complex internal structure is simply encapsulated: the advanced options are available, but don't have to be used. All that is needed for a simple plot is to provide values for the X and Y coordinates of the plot. At the same time, because it is a compiled extension to Tcl, BLT enjoys quite good performance, even on large plots, comparable to visualization tools written entirely in C.
Thanks to the dynamic loading of shared libraries and extensions, an existing program can be enhanced with graphing capability by simply loading the BLT package. This creates the new graph command in the Tcl interpreter, which can then be used in the script that constructs the GUI.
Another example of a useful software abstraction that pops up in several places in our work is the numerical array. Such arrays are extremely important in science: they may contain vectors of data, geometrical coordinates, matrices, etc. The standard HLLs usually have a concept of such an array, but it is usually a second-class object. Arrays provide space for storage of data, but it is not possible to perform infix arithmetic operations on them in the same way as on simple, scalar variables. The array processing in such languages is done one element at a time, which is prohibitively slow for large matrices (see example below). (Of course, FORTRAN90 and as C++ with appropriate matrix algebra libraries allow writing computations like A*B for the matrices as well as scalar variables, but these environments aren't common yet, either on Linux or on commercial platforms.)
Typical C (HLL) code for doing a matrix multiply is as follows:
for( i=0 ; i<N ; i++) for( j=0 ; j<N ; j++) for( k=0 ; k<N ; k++) C[i][j] = A[j][k] * B[k][i] } } }
For Matlab/Octave (VHLL), the code looks like this:
C = A * BThe VHLL code is obviously easier to write. Also, in an interpretive language, the loop iterations are interpreted one by one; in VHLL, the whole operation is executed in machine code at full speed.
The term “Very High-Level Languages” refers to such problem domain-specific languages. For numerical computation, there is a commercial VHLL called Matlab. It provides a sophisticated environment for calculation and display of numerical data, with array variables as first-class objects. It is a very nice toolkit and is supported on Linux. Interestingly, there is a free clone of Matlab, called Octave, that provides a large part of its functionality; Matlab code typically runs unchanged in Octave. (See “Octave: A Free, High-Level Language for Mathematics” by Malcolm Murphy, Linux Journal, July 1997.) Those systems are addictive; once you use them for a while it is hard to go back to FORTRAN.
The above remarks are equally relevant on any OS platform, whether it is different flavors of UNIX or even on Windows or Macintosh. However, Linux provides the most complete software development environment. Various native scripting systems exist on individual platforms: Visual Basic on Windows, Hypercard and Metacard on a Macintosh; however, the commercial offerings are never complete. For instance, Visual Basic requires a separate C compiler to create binary extensions. On the other hand, Linux provides all the tools (Tcl/Tk libraries and header files, GCC compiler, etc.) out of the box.
Przemek Klosowski is a physicist working at National Institute of Standards and Technology. Since he stumbled onto the Internet 13 years ago, Linux 6 years ago and founded Washington DC Linux User Group 4 years ago, he is beginning to feel like an old geezer. This feeling is reinforced by his failure to get excited by Java. Still, his youthful enthusiasm is maintained by the success of Linux and other Open Software initiatives that he supports and sometimes contributes to. He can be reached via e-mail at email@example.com.
Nick Maliszewskyj is a physicist at the NIST Center for Neutron Research in Gaithersburg, MD, where he loves to play with the big toys to be found there. His current mission is to write software that will let hordes of other people play with them too. Activities in the non-binary world include Aikido, home repair, watching his 3-year-old son with amazement and preparing for the arrival of his second child. Nick can be reached by e-mail at firstname.lastname@example.org.
Bud Dickerson has always worked for physicists because they let him play with cool toys. He sleeps too well at night, however to be any better with Linux than he is. He can be reached at email@example.com.
- High-Availability Storage with HA-LVM
- DNSMasq, the Pint-Sized Super Dæmon!
- Localhost DNS Cache
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- Linux for Astronomers
- You're the Boss with UBOS
- The Usability of GNOME
- Multitenant Sites
- PostgreSQL, the NoSQL Database