The REDACLE Work-Flow Management System
Subnuclear particles, tiny objects indeed, need to be revealed and measured by huge detectors. This field is known as high-energy Physics (HEP), and experimental HEP is a cutting-edge science. It uses and promotes the most recent technologies, it invents new tools and it encourages knowledge exchange. For all of these reasons, HEP has long been the realm of open-source software.
The bad news is HEP has become increasingly complicated; what was built in a craftsman-like style yesterday is now an industrial process that requires dedicated management software, usually expensive. We are living this experience in our experiment: a large international collaboration engaged in the construction of a 12,500-ton detector, called CMS (Compact Muon Solenoid), scheduled to take data at the CERN, Geneva, Large Hadron Collider in 2007. Our group in Rome, endowed by the Italian Institute for Nuclear Physics (INFN) and located in the Physics Department of the University La Sapienza, is working on the construction of the electromagnetic calorimeter. The calorimeter is made from about 500,000 parts, including scintillating crystals and photo-detectors. This process requires data management, quality control and bookkeeping, all of which relies on work-flow management.
A work-flow management system (WFMS) is “software that enables the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules” (www.e-workflow.org). Using a WFMS allows a coordinator to establish the flow of operations needed to realize a product. Operators are guided through the construction sequence, and unforeseen deviations from the sequence are avoided. Each operation generates data, such as measurements, comments and tags, that are recorded in a database.
Originally, a WFMS based on proprietary components was used in our production for about two years. It proved to be clumsy, slow, resource-demanding, hard to resume after hang-ups and troublesome to integrate with other tools. When the flow of incoming calorimeter parts became higher and the assembly rate could not catch up, we made the decision to develop our own solution, based on open-source components. Our requirements were to avoid the previous inefficiencies, to interface transparently with input and output data and to have a flexible solution. We chose to implement a system based on the LAMP (Linux, Apache, MySQL and Perl/PHP/Python) platform. Each component of LAMP has an important role: Linux and Apache provide the basic infrastructure for services and programming; MySQL is the back end of our WFMS; and Perl/PHP manage the interaction operator database.
Our WFMS is called REDACLE (Relational ECAL Database at Construction LEvel). In more detail, our requirements for the database design were:
High flexibility: the database structure should not change when adding new products or activities.
Ability to store quality control (QC) data: quality assurance is an important part of our work and collected data must be available to everyone for statistical analysis.
Variety of access: the database should be able to be queried through different methods, including shells, programs, scripts and the Web.
Requirement 3 automatically was satisfied by MySQL, and this fact, together with its simplicity and completeness, was the main reason we adopted LAMP. In order to satisfy the first two requirements, we developed a set of tables following a pattern, which is a common and standard way to solve a given problem, as in OO programming. We used the pattern called homomorphism, which is a simple representation of a many-to-one relationship. In practice, each part of the specific process with which we are dealing is represented in the database as records in two tables, an object table and an object definition table. Each object definition has an ID, actually a MySQL primary key number. Many objects share the same object definition, and the relationship between them is provided by a foreign key in the object table containing the corresponding definition ID.
An example might explain this design better. As stated in the introduction, our calorimeter is composed of many parts of different types. Each kind of part, such as a crystal, has many instances. The whole calorimeter has about 75,000 crystals. Parts and part definitions are kept in two separate database tables, as shown in Tables 1 and 2. Different instances of a part share the same part definition by the proper part ID in the partDefinition_id column. In these two tables, the part 33105000006306 is a type 1L barrel crystal, as shown by its partDefinition_id 195 found in the partDefinition table.
Table 1. The Part Table in REDACLE
Table 2. The partDefinition Table in REDACLE
The real benefit of this approach is flexibility. If, for any reason, new parts enter the game, the REDACLE database structure will not be modified. It is enough to add a new record to the definition table and relate it to new parts. But the REDACLE database is even more flexible; if we were building cars rather than calorimeters, the database structure would be exactly the same.
Activities are represented using the same approach: an Activity table holds instances of records described in the ActivityDescription table. Inserting a new activity within the work flow is a matter of supplying its description to the definition table and relating it to its occurrences in the Activity table. Again, with this design it is possible to describe a completely different business seamlessly. For mail delivery, for instance, the definition records could contain the description of the operations to be done on reception, shunting and delivery, while the Activity table could contain records with information about when a given operation was done on which parcel.
The work flow is defined by collecting several activity definitions and defining the order in which they should be executed. The interface software then checks that the activity being executed at a given time follows, in the work-flow definition, the last completed activity performed on a part. Activities can be skipped or repeated according to the interface software.
For quality control data we adopted the same homomorphic pattern by adding a further level of abstraction. We defined characteristics as data collected during a given activity performed on a part. The Characteristics table, however, does not store actual values, because they can be of a different nature—strings, numbers or even more complex types. The Characteristics table simply is a collection of keys: one of them links the characteristics to its definition in the charDefinition table. Actual characteristics are kept in separate tables according to their type.
Our process has three data types: single floating-point numbers, triplets of numbers and strings. The length of a crystal, for example, is a single number and is stored in the Value table. Some measurements are taken at different points along the crystal axis and in different conditions. The optical transmission, for one, is measured every 2cm at different wavelengths. It constitutes a triplet, the first number representing the position, the second the wavelength and the third the transmission. Each triplet is stored as a record in the multiValue table. The same is true for strings: operators perform a visual inspection of each crystal before manipulation, and they may provide comments to illustrate possible defects. In Tables 3 through 7, we show the above-mentioned tables for characteristics representation. The part 33101000018045 has been measured for length and transmission (TTO). Length is 229.7815mm in table value. The char_id field is 134821 pointing in the Characteristics table to charDefinition_id=6 corresponding to crystal length. The TTO is a set of triplets in the multiValue table. The visual inspection of that crystal resulted in the comment nonhomogeneous in the charValue table.
Table 3. charDefinition
|2||result of visual inspection||VIS_I_OPER||2|
Table 4. Characteristics
Table 6. multiValue
Again, this method makes REDACLE qualified for different types of businesses; in a dairy it could be used to record the bacterial load for each batch of milk, besides the producer (a character string), as characteristics. In addition, a completely new data type, such as pictures or sounds, could be added to the database without disturbing the schema simply by defining a new table. Adding pictures, for instance, implies the creation of a table with three fields: primary key, picture data as a BLOB and the relation to the Characteristics table, which is expressed by an integer ID. The MySQL code to create such a table is:
CREATE TABLE picture ( id INT NOT NULL AUTO_INCREMENT, data BLOB, char_id INT, INDEX (char_id), PRIMARY KEY (id) );
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- SourceClear Open
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Google's SwiftShader Released
- Non-Linux FOSS: Caffeine!
- Parsing an RSS News Feed with a Bash Script
- Doing for User Space What We Did for Kernel Space
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide