Implementing a Research Knowledge Base

Keeping up with large volumes of research requires a systemboth flexible and intuitive.
User Authentication and Sessions

The user login and sessions are managed through the session API from J2EE. Usernames and passwords are authorized from a database table, then the servlet engine establishes a session for this user. The servlet engine tracks the sessions through session objects.

The session object needs to know about the user who owns it. That includes the user name, profile, preferences and current browsing status. Storing such information in the session object improves performance a lot. Otherwise, for example, the program has to look up the database for display preferences every time prior to displaying a page for a user.

Instead, I could store all the information inside the session object itself. But for better organization, I used several JavaBeans associated with session objects to store additional user information. In JSP specifications, a JavaBean can have a scope of a session or a page. Sessions with JavaBeans greatly simplify the development work.

Database Connections

Database connections are expensive to establish. To open a new connection every time we need to query can slow down the computer drastically and use up the system resources quickly.

There are several ways we can reduce the need for new connections. One way is to assign one connection for each session. That connection can be stored in a bean with session scope, and all the queries from that session go through it. However, if many sessions are active at the same time, available connections run out fast. This solution does not scale well.

Another choice is to assign one connection per page. However, I do not like this idea; the connection object has to be passed to the middleware by the JSP coder. This is not intuitive to a nontechnical JSP coder and defies the goal of separating presentation from application logic. It is possible to design a set of custom JSP tags to encapsulate the database connections so that JSP coders do not see them, but it requires extra designing work and that the JSP coder learn a nonstandard language. Using custom tags certainly increases productivity in the long run, but it also makes short and simple changes more difficult with a steeper learning curve for the system. I am still seeking the best solution. For now, I decided to hide the connection handling completely inside the middleware.

I made a new database connection from every database query function in the middleware. Each query function completes multiple-related queries to make efficient use of connections. Each JSP page calls only one or two such functions.

To reduce the overhead caused by opening new connections, I used “connection pool” utilities. These utilities are classes that maintain a pool of open connections in memory. When the user requests a new connection, it simply fetches one from its pool rather than making a new one. When the user closes the connection, it returns to the pool. There are several such utilities, e.g., PoolMan. Their usage is very straightforward, and you only need minimal changes to convert your code to take advantage of those utilities.

Text Processing

We have to allow users to mix some HTML tags in the text so that they can format the memo and comments properly. However, displaying user HTML has a big security risk: unauthorized user HTML tags can corrupt/change the display of page navigation and other people's comments. They can then be used to load unauthorized images/applets or even redirect the user to another site using JavaScript.

So, I decided to allow only four HTML tags: <p>, <b>, <i> and <a>. That ensures the user cannot change the format of any content other than her or his own. The unauthorized HTML tags are filtered out before submitting text into the database. The allowed HTML tags are configurable by the site administrator at compile time.

It is also important to note that some users might want to input XML sample code or mathematical formulas containing “<” in memos or comments. It would be inappropriate to treat all “<” as HTML tag tokens. So, I provide the plain text mode under which all the “<” symbols are converted to “&lt;” before being submitted into the database.

A powerful way to process text is using regular expressions. There have been several good Java regular expression engines available. I choose gnu.regexp because it is a free software implementation of almost all Perl5 regular expression features through a simple and intuitive API. The code for filtering HTML tags and composing the SQL query string using a regular expression engine is listed in Listing 2 [available at].



Michael Yuan is a PhD candidate in Astrophysics at University of Texas at Austin. He studies remote quasars (20+ billion light years away) to understand the history and evolution of our universe. When he is not observing quasars, he enjoys developing useful software using earthly languages such as Java and Perl.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Linux in Education: Implementing a Research Knowledge Base

Anonymous's picture

I am a first time student taking a 10 week course in Linux. This is a very useful article to write my term paper, "Modern Linux." Thank you very much for writing it.

Lisa Dusendang