Implementing a Research Knowledge Base
The back end of the system is naturally a relational database management system (RDBMS). Most of the data we want to store/retrieve/search is textual, and RDBMS is perfect for this purpose. Any SQL-enabled database server will work, and there are many such servers to choose from. I chose the GPLed MySQL for its speed and reliability. If data integrity and transaction support is a must for you, you can choose the open-source PostgreSQL database server.
The middleware is a collection of classes/methods that wrap the data query operations. They are designed to shield front-end developers from the technical details of database connections and SQL language.
I chose to use Java to develop the middleware. One big advantage of using Java is that it is a full-blown, object-oriented language, making it much easier to implement complex logic/structure designs required by large projects. Using Java also allows us to take advantage of a large number of utility classes already existing as Java libraries or beans. The ones I used for this project include JDBC driver, database connection pool, session management and text processing.
The standard way to build database middleware in J2EE is to use entity EJBs (Enterprise JavaBeans). However, this approach requires running an EJB container, which can be expensive. In fact, few low-cost JSP hosting services provide EJB containers. For OpenReference's relatively simple database structure, I decided to use a simpler approach: static methods in helper classes to provide database access. Each row in a table is represented by a HashTable.
The middleware uses JDBC to pass information between the Java application and the SQL database. There are JDBC drivers for all the major RDBMSes. For MySQL, I used the mm.mysql driver. I constructed one class for each database table. The class knows the fields in that table and knows which fields are searchable. Each class implements a set of basic data query functions (e.g., getAllRows, AddRow, updateRows, etc.) and a search function that searches all the searchable fields and returns all the matched rows. Each class also has its own query functions to do specific or cross-table queries. For example, in the ReferenceTable.java class, there is a function getReferencesByUserName. This will take the user name as input and find the corresponding user ID in the User table, and then return the rows with matching user ID from the Reference table. As an example, see Listing 1 [available at ftp.linuxjournal.com/pub/lj/listings/issue91/4769.tgz] for the complete API for the Category class.
I chose JavaServer Pages (JSPs) for the front end. JSPs have the power of the full Java language plus the benefit of separating the web presentation from the application functions. JSPs support all the HTML tag syntax for formatting display, and one can add whole Java programs dealing with beans and other functions in HTML comments. One can even design custom tags to encapsulate the back-end operations (e.g., database queries). It is easy to train an HTML programmer/presentation expert to work on the web pages using their favorite HTML editor, without caring about how the database queries are executed. In the meantime, a back-end programmer can work on the data query part without caring about how the data will be displayed. More information about JSPs can be found in Reuven Lerner's At the Forge column in the May through July 2001 issues of Linux Journal.
Any J2EE-compatible Java server is capable of running JSPs. My favorites are the Tomcat engine from the Apache Foundation and the Resin engine from Caucho Technology. Either one of them can run as an extension module of the Apache web server to take advantage of many other useful features of Apache. Tomcat is released under GPL. Resin is closed-source software but is free for noncommercial use. Resin runs considerably faster than Tomcat and offers some useful features, such as a built-in JDBC driver and driver pool and multiple JVM for fallback.
Originally I planned to use an XML file to represent the category structure because XML documents are naturally organized in a tree structure (DOM model), and they are easy for human reading and editing. There are many good XML-DOM tools in Java to manipulate trees.
However, we need a large and dynamic category structure for accurate classification and browsing, as it is important to be able to search the categories. One drawback of using XML is its difficulty to be searched together with other content stored in RDBMS. Also, to store a big parsed DOM object in memory constantly is inefficient and difficult to synchronize among several JVMs. To store it on disk and parse it when needed introduces too much processing overhead. So, I decided to use database tables for the categories.
The whole category is stored in a database table. Each record represents a category and has its unique category ID. It also has the parent category ID so that the records are linked together into a tree. References are linked to the categories by a separate category ID vs. reference ID table.
The Category class in middleware contains all methods to operate the tree. In order to maintain the links and structure integrity of the category table, it is important that we manipulate the category table only through the Category object public methods. Some important methods include:
Insert new subcategories in or between any level(s) in the tree. A special note about adding a new subcategory under a leaf category: since all references must be associated with leaves only, references that were associated with the old leaf become associated with the new subcategory now.
Change category description/keywords/properties.
Delete a category from any level in the tree to make the target category's children be children of its parent and then delete the target itself from the table.
Return a list of children (or parent) for a given category.
Search keywords from category descriptions.
An illustrative listing of Category-class source code is presented in Listing 1. I found those methods sufficient for my use. You might want to do more complicated operations, such as moving a subtree to another branch, etc. The strength of open-source software is that anyone can add functions to the code without rewriting the basic part.
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
|Trying to Tame the Tablet||May 08, 2013|
|Dart: a New Web Programming Experience||May 07, 2013|
- RSS Feeds
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Home, My Backup Data Center
- Validate an E-Mail Address with PHP, the Right Way
- New Products
- Trying to Tame the Tablet
- Developer Poll
- not living upto the mobile revolution
56 min 43 sec ago
- Deceptive Advertising and
1 hour 32 min ago
- Let\'s declare that you have
1 hour 33 min ago
- Alterations in Contest Due
1 hour 34 min ago
- At a numbers mindset, your
1 hour 35 min ago
- Do not get Just Almost any
1 hour 38 min ago
- A fantastic rule-of-thumb to
1 hour 40 min ago
- Keren mastah..
2 hours 38 min ago
- mini tablet compare
3 hours 56 min ago
- Looking Good
7 hours 30 min ago
Enter to Win an Adafruit Prototyping Pi Plate Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Prototyping Pi Plate Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- Next winner announced on 5-21-13!
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.