Implementing a Research Knowledge Base
The back end of the system is naturally a relational database management system (RDBMS). Most of the data we want to store/retrieve/search is textual, and RDBMS is perfect for this purpose. Any SQL-enabled database server will work, and there are many such servers to choose from. I chose the GPLed MySQL for its speed and reliability. If data integrity and transaction support is a must for you, you can choose the open-source PostgreSQL database server.
The middleware is a collection of classes/methods that wrap the data query operations. They are designed to shield front-end developers from the technical details of database connections and SQL language.
I chose to use Java to develop the middleware. One big advantage of using Java is that it is a full-blown, object-oriented language, making it much easier to implement complex logic/structure designs required by large projects. Using Java also allows us to take advantage of a large number of utility classes already existing as Java libraries or beans. The ones I used for this project include JDBC driver, database connection pool, session management and text processing.
The standard way to build database middleware in J2EE is to use entity EJBs (Enterprise JavaBeans). However, this approach requires running an EJB container, which can be expensive. In fact, few low-cost JSP hosting services provide EJB containers. For OpenReference's relatively simple database structure, I decided to use a simpler approach: static methods in helper classes to provide database access. Each row in a table is represented by a HashTable.
The middleware uses JDBC to pass information between the Java application and the SQL database. There are JDBC drivers for all the major RDBMSes. For MySQL, I used the mm.mysql driver. I constructed one class for each database table. The class knows the fields in that table and knows which fields are searchable. Each class implements a set of basic data query functions (e.g., getAllRows, AddRow, updateRows, etc.) and a search function that searches all the searchable fields and returns all the matched rows. Each class also has its own query functions to do specific or cross-table queries. For example, in the ReferenceTable.java class, there is a function getReferencesByUserName. This will take the user name as input and find the corresponding user ID in the User table, and then return the rows with matching user ID from the Reference table. As an example, see Listing 1 [available at ftp.linuxjournal.com/pub/lj/listings/issue91/4769.tgz] for the complete API for the Category class.
I chose JavaServer Pages (JSPs) for the front end. JSPs have the power of the full Java language plus the benefit of separating the web presentation from the application functions. JSPs support all the HTML tag syntax for formatting display, and one can add whole Java programs dealing with beans and other functions in HTML comments. One can even design custom tags to encapsulate the back-end operations (e.g., database queries). It is easy to train an HTML programmer/presentation expert to work on the web pages using their favorite HTML editor, without caring about how the database queries are executed. In the meantime, a back-end programmer can work on the data query part without caring about how the data will be displayed. More information about JSPs can be found in Reuven Lerner's At the Forge column in the May through July 2001 issues of Linux Journal.
Any J2EE-compatible Java server is capable of running JSPs. My favorites are the Tomcat engine from the Apache Foundation and the Resin engine from Caucho Technology. Either one of them can run as an extension module of the Apache web server to take advantage of many other useful features of Apache. Tomcat is released under GPL. Resin is closed-source software but is free for noncommercial use. Resin runs considerably faster than Tomcat and offers some useful features, such as a built-in JDBC driver and driver pool and multiple JVM for fallback.
Originally I planned to use an XML file to represent the category structure because XML documents are naturally organized in a tree structure (DOM model), and they are easy for human reading and editing. There are many good XML-DOM tools in Java to manipulate trees.
However, we need a large and dynamic category structure for accurate classification and browsing, as it is important to be able to search the categories. One drawback of using XML is its difficulty to be searched together with other content stored in RDBMS. Also, to store a big parsed DOM object in memory constantly is inefficient and difficult to synchronize among several JVMs. To store it on disk and parse it when needed introduces too much processing overhead. So, I decided to use database tables for the categories.
The whole category is stored in a database table. Each record represents a category and has its unique category ID. It also has the parent category ID so that the records are linked together into a tree. References are linked to the categories by a separate category ID vs. reference ID table.
The Category class in middleware contains all methods to operate the tree. In order to maintain the links and structure integrity of the category table, it is important that we manipulate the category table only through the Category object public methods. Some important methods include:
Insert new subcategories in or between any level(s) in the tree. A special note about adding a new subcategory under a leaf category: since all references must be associated with leaves only, references that were associated with the old leaf become associated with the new subcategory now.
Change category description/keywords/properties.
Delete a category from any level in the tree to make the target category's children be children of its parent and then delete the target itself from the table.
Return a list of children (or parent) for a given category.
Search keywords from category descriptions.
An illustrative listing of Category-class source code is presented in Listing 1. I found those methods sufficient for my use. You might want to do more complicated operations, such as moving a subtree to another branch, etc. The strength of open-source software is that anyone can add functions to the code without rewriting the basic part.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- Doing for User Space What We Did for Kernel Space
- Tech Tip: Really Simple HTTP Server with Python
- SuperTuxKart 0.9.2 Released
- Parsing an RSS News Feed with a Bash Script
- Rogue Wave Software's Zend Server
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide