The REDACLE Work-Flow Management System
Subnuclear particles, tiny objects indeed, need to be revealed and measured by huge detectors. This field is known as high-energy Physics (HEP), and experimental HEP is a cutting-edge science. It uses and promotes the most recent technologies, it invents new tools and it encourages knowledge exchange. For all of these reasons, HEP has long been the realm of open-source software.
The bad news is HEP has become increasingly complicated; what was built in a craftsman-like style yesterday is now an industrial process that requires dedicated management software, usually expensive. We are living this experience in our experiment: a large international collaboration engaged in the construction of a 12,500-ton detector, called CMS (Compact Muon Solenoid), scheduled to take data at the CERN, Geneva, Large Hadron Collider in 2007. Our group in Rome, endowed by the Italian Institute for Nuclear Physics (INFN) and located in the Physics Department of the University La Sapienza, is working on the construction of the electromagnetic calorimeter. The calorimeter is made from about 500,000 parts, including scintillating crystals and photo-detectors. This process requires data management, quality control and bookkeeping, all of which relies on work-flow management.
A work-flow management system (WFMS) is “software that enables the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules” (www.e-workflow.org). Using a WFMS allows a coordinator to establish the flow of operations needed to realize a product. Operators are guided through the construction sequence, and unforeseen deviations from the sequence are avoided. Each operation generates data, such as measurements, comments and tags, that are recorded in a database.
Originally, a WFMS based on proprietary components was used in our production for about two years. It proved to be clumsy, slow, resource-demanding, hard to resume after hang-ups and troublesome to integrate with other tools. When the flow of incoming calorimeter parts became higher and the assembly rate could not catch up, we made the decision to develop our own solution, based on open-source components. Our requirements were to avoid the previous inefficiencies, to interface transparently with input and output data and to have a flexible solution. We chose to implement a system based on the LAMP (Linux, Apache, MySQL and Perl/PHP/Python) platform. Each component of LAMP has an important role: Linux and Apache provide the basic infrastructure for services and programming; MySQL is the back end of our WFMS; and Perl/PHP manage the interaction operator database.
Our WFMS is called REDACLE (Relational ECAL Database at Construction LEvel). In more detail, our requirements for the database design were:
High flexibility: the database structure should not change when adding new products or activities.
Ability to store quality control (QC) data: quality assurance is an important part of our work and collected data must be available to everyone for statistical analysis.
Variety of access: the database should be able to be queried through different methods, including shells, programs, scripts and the Web.
Requirement 3 automatically was satisfied by MySQL, and this fact, together with its simplicity and completeness, was the main reason we adopted LAMP. In order to satisfy the first two requirements, we developed a set of tables following a pattern, which is a common and standard way to solve a given problem, as in OO programming. We used the pattern called homomorphism, which is a simple representation of a many-to-one relationship. In practice, each part of the specific process with which we are dealing is represented in the database as records in two tables, an object table and an object definition table. Each object definition has an ID, actually a MySQL primary key number. Many objects share the same object definition, and the relationship between them is provided by a foreign key in the object table containing the corresponding definition ID.
An example might explain this design better. As stated in the introduction, our calorimeter is composed of many parts of different types. Each kind of part, such as a crystal, has many instances. The whole calorimeter has about 75,000 crystals. Parts and part definitions are kept in two separate database tables, as shown in Tables 1 and 2. Different instances of a part share the same part definition by the proper part ID in the partDefinition_id column. In these two tables, the part 33105000006306 is a type 1L barrel crystal, as shown by its partDefinition_id 195 found in the partDefinition table.
Table 1. The Part Table in REDACLE
| ID | partDefinition_id |
|---|---|
| 33105000006306 | 195 |
| 33105000006307 | 196 |
| 33105000006308 | 197 |
| 33105000006309 | 198 |
| 33105000006310 | 196 |
Table 2. The partDefinition Table in REDACLE
| ID | Name | Subname | Type |
|---|---|---|---|
| 195 | crystal | Barrel | 1L |
| 196 | capsule | Barrel | T4 |
| 197 | Alveola | Barrel | 3 |
| 198 | subunit | Barrel | 5 |
The real benefit of this approach is flexibility. If, for any reason, new parts enter the game, the REDACLE database structure will not be modified. It is enough to add a new record to the definition table and relate it to new parts. But the REDACLE database is even more flexible; if we were building cars rather than calorimeters, the database structure would be exactly the same.
Activities are represented using the same approach: an Activity table holds instances of records described in the ActivityDescription table. Inserting a new activity within the work flow is a matter of supplying its description to the definition table and relating it to its occurrences in the Activity table. Again, with this design it is possible to describe a completely different business seamlessly. For mail delivery, for instance, the definition records could contain the description of the operations to be done on reception, shunting and delivery, while the Activity table could contain records with information about when a given operation was done on which parcel.
The work flow is defined by collecting several activity definitions and defining the order in which they should be executed. The interface software then checks that the activity being executed at a given time follows, in the work-flow definition, the last completed activity performed on a part. Activities can be skipped or repeated according to the interface software.
For quality control data we adopted the same homomorphic pattern by adding a further level of abstraction. We defined characteristics as data collected during a given activity performed on a part. The Characteristics table, however, does not store actual values, because they can be of a different nature—strings, numbers or even more complex types. The Characteristics table simply is a collection of keys: one of them links the characteristics to its definition in the charDefinition table. Actual characteristics are kept in separate tables according to their type.
Our process has three data types: single floating-point numbers, triplets of numbers and strings. The length of a crystal, for example, is a single number and is stored in the Value table. Some measurements are taken at different points along the crystal axis and in different conditions. The optical transmission, for one, is measured every 2cm at different wavelengths. It constitutes a triplet, the first number representing the position, the second the wavelength and the third the transmission. Each triplet is stored as a record in the multiValue table. The same is true for strings: operators perform a visual inspection of each crystal before manipulation, and they may provide comments to illustrate possible defects. In Tables 3 through 7, we show the above-mentioned tables for characteristics representation. The part 33101000018045 has been measured for length and transmission (TTO). Length is 229.7815mm in table value. The char_id field is 134821 pointing in the Characteristics table to charDefinition_id=6 corresponding to crystal length. The TTO is a set of triplets in the multiValue table. The visual inspection of that crystal resulted in the comment nonhomogeneous in the charValue table.
Table 3. charDefinition
| ID | Description | Name | Unit | activityDef_id |
|---|---|---|---|---|
| 2 | result of visual inspection | VIS_I_OPER | 2 | |
| 6 | crystal length | DL | mm | 3 |
| 26 | transversal transmission | TTO | mm#nm#% | 4 |
Table 4. Characteristics
| ID | charDefinition_id | part_id | activity_id |
|---|---|---|---|
| 106035 | 2 | 33101000018045 | 10660 |
| 134821 | 6 | 33101000018045 | 16093 |
| 135252 | 26 | 33101000018045 | 16182 |
Table 6. multiValue
| ID | x | y | z | char_id |
|---|---|---|---|---|
| 748867 | 15 | 700 | 76.1 | 135252 |
| 748907 | 35 | 700 | 75.7 | 135252 |
| 748947 | 55 | 700 | 75.9 | 135252 |
| 748987 | 75 | 700 | 76.1 | 135252 |
| 749027 | 95 | 700 | 76 | 135252 |
| 749067 | 115 | 700 | 75.5 | 135252 |
| 749107 | 135 | 700 | 76 | 135252 |
| 749147 | 155 | 700 | 75.7 | 135252 |
| 749187 | 175 | 700 | 76.3 | 135252 |
| 749227 | 195 | 700 | 76 | 135252 |
| 749267 | 215 | 700 | 74.6 | 135252 |
Again, this method makes REDACLE qualified for different types of businesses; in a dairy it could be used to record the bacterial load for each batch of milk, besides the producer (a character string), as characteristics. In addition, a completely new data type, such as pictures or sounds, could be added to the database without disturbing the schema simply by defining a new table. Adding pictures, for instance, implies the creation of a table with three fields: primary key, picture data as a BLOB and the relation to the Characteristics table, which is expressed by an integer ID. The MySQL code to create such a table is:
CREATE TABLE picture (
id INT NOT NULL AUTO_INCREMENT,
data BLOB,
char_id INT,
INDEX (char_id),
PRIMARY KEY (id)
);
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
| Dart: a New Web Programming Experience | May 07, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Validate an E-Mail Address with PHP, the Right Way
- New Products
- Tech Tip: Really Simple HTTP Server with Python
- Trying to Tame the Tablet
- git-annex assistant
2 hours 43 min ago - direct cable connection
3 hours 5 min ago - Agreed on AirDroid. With my
3 hours 16 min ago - I just learned this
3 hours 20 min ago - enterprise
3 hours 50 min ago - not living upto the mobile revolution
6 hours 41 min ago - Deceptive Advertising and
7 hours 17 min ago - Let\'s declare that you have
7 hours 18 min ago - Alterations in Contest Due
7 hours 19 min ago - At a numbers mindset, your
7 hours 20 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.





Comments
Re: The REDACLE Work-Flow Management System
Is there a missing Parts table example in the article? Tables 1 and 2 are mentioned in the 2nd paragraph of the REDACLE:the Database Design section. The table 2 example seems to pertain to text found later in the article.
I would like to see the schema for the partDefinition table.