The REDACLE Work-Flow Management System

A MySQL-based system handles the data management, quality control and bookkeeping for building a new scientific instrument with 500,000 parts. Here's how you can adapt it for your manufacturing process.

Subnuclear particles, tiny objects indeed, need to be revealed and measured by huge detectors. This field is known as high-energy Physics (HEP), and experimental HEP is a cutting-edge science. It uses and promotes the most recent technologies, it invents new tools and it encourages knowledge exchange. For all of these reasons, HEP has long been the realm of open-source software.

The bad news is HEP has become increasingly complicated; what was built in a craftsman-like style yesterday is now an industrial process that requires dedicated management software, usually expensive. We are living this experience in our experiment: a large international collaboration engaged in the construction of a 12,500-ton detector, called CMS (Compact Muon Solenoid), scheduled to take data at the CERN, Geneva, Large Hadron Collider in 2007. Our group in Rome, endowed by the Italian Institute for Nuclear Physics (INFN) and located in the Physics Department of the University La Sapienza, is working on the construction of the electromagnetic calorimeter. The calorimeter is made from about 500,000 parts, including scintillating crystals and photo-detectors. This process requires data management, quality control and bookkeeping, all of which relies on work-flow management.

Figure 1. Today's high-energy Physics is an industrial process.

Work-Flow Management

A work-flow management system (WFMS) is “software that enables the automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules” (www.e-workflow.org). Using a WFMS allows a coordinator to establish the flow of operations needed to realize a product. Operators are guided through the construction sequence, and unforeseen deviations from the sequence are avoided. Each operation generates data, such as measurements, comments and tags, that are recorded in a database.

Originally, a WFMS based on proprietary components was used in our production for about two years. It proved to be clumsy, slow, resource-demanding, hard to resume after hang-ups and troublesome to integrate with other tools. When the flow of incoming calorimeter parts became higher and the assembly rate could not catch up, we made the decision to develop our own solution, based on open-source components. Our requirements were to avoid the previous inefficiencies, to interface transparently with input and output data and to have a flexible solution. We chose to implement a system based on the LAMP (Linux, Apache, MySQL and Perl/PHP/Python) platform. Each component of LAMP has an important role: Linux and Apache provide the basic infrastructure for services and programming; MySQL is the back end of our WFMS; and Perl/PHP manage the interaction operator database.

REDACLE: the Database Design

Our WFMS is called REDACLE (Relational ECAL Database at Construction LEvel). In more detail, our requirements for the database design were:

  1. High flexibility: the database structure should not change when adding new products or activities.

  2. Ability to store quality control (QC) data: quality assurance is an important part of our work and collected data must be available to everyone for statistical analysis.

  3. Variety of access: the database should be able to be queried through different methods, including shells, programs, scripts and the Web.

Requirement 3 automatically was satisfied by MySQL, and this fact, together with its simplicity and completeness, was the main reason we adopted LAMP. In order to satisfy the first two requirements, we developed a set of tables following a pattern, which is a common and standard way to solve a given problem, as in OO programming. We used the pattern called homomorphism, which is a simple representation of a many-to-one relationship. In practice, each part of the specific process with which we are dealing is represented in the database as records in two tables, an object table and an object definition table. Each object definition has an ID, actually a MySQL primary key number. Many objects share the same object definition, and the relationship between them is provided by a foreign key in the object table containing the corresponding definition ID.

An example might explain this design better. As stated in the introduction, our calorimeter is composed of many parts of different types. Each kind of part, such as a crystal, has many instances. The whole calorimeter has about 75,000 crystals. Parts and part definitions are kept in two separate database tables, as shown in Tables 1 and 2. Different instances of a part share the same part definition by the proper part ID in the partDefinition_id column. In these two tables, the part 33105000006306 is a type 1L barrel crystal, as shown by its partDefinition_id 195 found in the partDefinition table.

Table 1. The Part Table in REDACLE

IDpartDefinition_id
33105000006306195
33105000006307196
33105000006308197
33105000006309198
33105000006310196

Table 2. The partDefinition Table in REDACLE

IDNameSubnameType
195crystalBarrel1L
196capsuleBarrelT4
197AlveolaBarrel3
198subunitBarrel5

The real benefit of this approach is flexibility. If, for any reason, new parts enter the game, the REDACLE database structure will not be modified. It is enough to add a new record to the definition table and relate it to new parts. But the REDACLE database is even more flexible; if we were building cars rather than calorimeters, the database structure would be exactly the same.

Activities are represented using the same approach: an Activity table holds instances of records described in the ActivityDescription table. Inserting a new activity within the work flow is a matter of supplying its description to the definition table and relating it to its occurrences in the Activity table. Again, with this design it is possible to describe a completely different business seamlessly. For mail delivery, for instance, the definition records could contain the description of the operations to be done on reception, shunting and delivery, while the Activity table could contain records with information about when a given operation was done on which parcel.

The work flow is defined by collecting several activity definitions and defining the order in which they should be executed. The interface software then checks that the activity being executed at a given time follows, in the work-flow definition, the last completed activity performed on a part. Activities can be skipped or repeated according to the interface software.

For quality control data we adopted the same homomorphic pattern by adding a further level of abstraction. We defined characteristics as data collected during a given activity performed on a part. The Characteristics table, however, does not store actual values, because they can be of a different nature—strings, numbers or even more complex types. The Characteristics table simply is a collection of keys: one of them links the characteristics to its definition in the charDefinition table. Actual characteristics are kept in separate tables according to their type.

Our process has three data types: single floating-point numbers, triplets of numbers and strings. The length of a crystal, for example, is a single number and is stored in the Value table. Some measurements are taken at different points along the crystal axis and in different conditions. The optical transmission, for one, is measured every 2cm at different wavelengths. It constitutes a triplet, the first number representing the position, the second the wavelength and the third the transmission. Each triplet is stored as a record in the multiValue table. The same is true for strings: operators perform a visual inspection of each crystal before manipulation, and they may provide comments to illustrate possible defects. In Tables 3 through 7, we show the above-mentioned tables for characteristics representation. The part 33101000018045 has been measured for length and transmission (TTO). Length is 229.7815mm in table value. The char_id field is 134821 pointing in the Characteristics table to charDefinition_id=6 corresponding to crystal length. The TTO is a set of triplets in the multiValue table. The visual inspection of that crystal resulted in the comment nonhomogeneous in the charValue table.

Table 3. charDefinition

IDDescriptionNameUnitactivityDef_id
2result of visual inspectionVIS_I_OPER 2
6crystal lengthDLmm3
26transversal transmissionTTOmm#nm#%4

Table 4. Characteristics

IDcharDefinition_idpart_idactivity_id
10603523310100001804510660
13482163310100001804516093
135252263310100001804516182

Table 5. Value

IDxchar_id
37614229.7815134821

Table 6. multiValue

IDxyzchar_id
7488671570076.1135252
7489073570075.7135252
7489475570075.9135252
7489877570076.1135252
7490279570076135252
74906711570075.5135252
74910713570076135252
74914715570075.7135252
74918717570076.3135252
74922719570076135252
74926721570074.6135252

Table 7. charValue

IDvaluechar_id
2872nonhomogeneous106035

Again, this method makes REDACLE qualified for different types of businesses; in a dairy it could be used to record the bacterial load for each batch of milk, besides the producer (a character string), as characteristics. In addition, a completely new data type, such as pictures or sounds, could be added to the database without disturbing the schema simply by defining a new table. Adding pictures, for instance, implies the creation of a table with three fields: primary key, picture data as a BLOB and the relation to the Characteristics table, which is expressed by an integer ID. The MySQL code to create such a table is:

CREATE TABLE picture (
        id INT NOT NULL AUTO_INCREMENT,
        data BLOB,
        char_id INT,
        INDEX (char_id),
        PRIMARY KEY (id)
);

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: The REDACLE Work-Flow Management System

Anonymous's picture

Is there a missing Parts table example in the article? Tables 1 and 2 are mentioned in the 2nd paragraph of the REDACLE:the Database Design section. The table 2 example seems to pertain to text found later in the article.

I would like to see the schema for the partDefinition table.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix