The REDACLE Work-Flow Management System
In our application, humans interact with the database in a multitude of ways—with the MySQL client, C++ and Java programs, Perl scripts and PHP scripts through Web pages. The use of a Web browser to render a graphical user interface (GUI) provides considerable advantages. The GUI is portable and does not require installation of specific components, no time is wasted on graphics, and the Web browser environment is well known by now to both operators or customers.
Another significant feature of REDACLE is its interface to other machines. During the calorimeter construction process, automatic machines take measurements of crystals and other parts without any human support (Figure 2). These machines, then, must be able to interact with REDACLE to learn the right sequence of operations to perform, to inform it about the start and the end time of the operations and to provide data to be stored as characteristics.
Our goal was to create a system that would allow almost any device to interact with REDACLE. We avoided imposing a given programming language or providing libraries for all the possible devices, because it does not scale with the market. Furthermore, some devices can be embedded systems with proprietary software.
We developed a dæmon called the Instrument Agent (IA) to act as an interface between REDACLE and instruments. The IA is a process that connects to an Internet port and is able to read ASCII characters and write them to that port. Instruments are required only to be able to connect to the network and send strings over the connection.
The sequence of operations is as follows:
After connecting, an instrument declares the part on which it is operating.
The IA queries the REDACLE database and searches for the last completed activity for that part in the work flow. The command the instrument should execute is stored in the database as a description field in the activityDefinition table.
The IA sends the instrument the proper command.
Upon recognition of the command, the instrument executes it, and the IA inserts a new activity in the REDACLE database after acknowledgement.
At the end of the job, the IA updates the activity just inserted, marks it as FINISHED and gets the data from the instrument as XML-formatted strings.
The result of the activity may contain both multiValue and charValue fields. Single values are formatted as follows:
<RE><FI>field name<VA>field value</VA></FI>...</RE>
<RE> stands for result, <FI> is field and <VA> is value. From the field name, the instrument agent obtains the characteristics definition ID and fills in the appropriate table according to the field value format (value for numbers and charValue for strings). The multiValue table is populated if the result is of the form:
<RE><NT>ntuple name <FI>field name<VA>field value</VA></FI> ... </NT></RE>
<NT> here stands for n-tuple, a collection of n elements.
Instrument software developers need not have knowledge of the details of the REDACLE database; they simply have to be instructed on the string formats to be used. No libraries to link to the program are prescribed, nor files to be included. The programming language is not imposed. The only requirement is to be able to provide a network connection to the IA.
Besides the GUI and instrument interfaces, we developed a set of ancillary command-line scripts for administrators and coordinators. In addition, we created a small library to run C++ programs and Perl scripts over the database without needing to formulate SQL queries.
REDACLE was released in our laboratory four months after the first discussions of the project were held. The whole system contains about 10,000 lines of code in Perl, C++, PHP and Java. The resources needed to run the software are small compared to the ones requested by the former system, hosted on a dual 800MHz Pentium III server. That system saturated the CPU at about 100% and occupied almost all of the 512MB of RAM. We also needed to upgrade all the client PCs, doubling their memories to support Java GUIs. So we planned a server upgrade to a dual 1GHz Pentium III with 1GB of RAM to improve the performance of the previous system. When using REDACLE, we discovered, amazingly enough, that CPU load was negligible and the average memory usage was 140–200MB.
It became clear that we had a need for tools to import from or export to the previous database, which still was used in other labs. These tools were built in Perl quickly, to read or write XML files, and we were able to import all the old data into REDACLE in one day.
Currently, we have about 13,000 parts in the database. The stored characteristics are 97,000, each of which may be composed of several values, for a total database size of 50MB. Out of the 15 tables the multiValue table, containing more than 1,000,000 records, is the largest at 41MB.
But the most spectacular result was obtained by comparing the time spent by operators in the calorimeter assembling. Before the introduction of REDACLE, 25% of the operators' time was wasted in the interaction with the work-flow manager. Using REDACLE, the interaction between operators and the database takes a negligible amount of time, improving the overall detector assembling efficiency.
Moreover, operators soon familiarized themselves with REDACLE's flexibility and started requesting new tools and interfaces. What before required weeks to develop or might have been almost impossible to build, now can be implemented in a short time with REDACLE and LAMP—between a few hours and two or three days.
The extraordinary flexibility of the REDACLE database design makes it practical for many different business processes and industrial applications, clearly illustrating how open-source software can be superior to proprietary offerings. In the future, we plan to support even more complex work-flow models as well as a library of frequently used queries and functions to be employed in the development of other REDACLE-based projects.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- Weechat, Irssi's Little Brother
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Poul-Henning Kamp: welcome to
1 hour 28 min ago
- This has already been done
1 hour 29 min ago
- Reply to comment | Linux Journal
2 hours 14 min ago
- Welcome to 1998
3 hours 2 min ago
- notifier shortcomings
3 hours 26 min ago
5 hours 3 min ago
- Android User
5 hours 4 min ago
- Reply to comment | Linux Journal
6 hours 58 min ago
9 hours 47 min ago
- This is a good post. This
15 hours 34 sec ago
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?