ATF Jubilee Edition

To celebrate his 50th column, Reuven casts his vision to the future of web development and suggests some current trends that will affect how the job gets done.
Pooled Database Connections

The third trend in the world of server-side programming is the issue of persistent database connections. Database servers were originally designed to handle one connection from each user every day, rather than once per minute or once per second. Consider this: if a CGI program connects to a relational database server once per second, you are exercising the connection mechanism more than 86,000 times what it was originally intended. In some cases, this does not mean very much, but there are many databases for which each connection is an expensive operation.

One solution is thus to open a database connection when the server first starts up and reuse that connection each time a program needs to contact the database. This is roughly what the Apache::DBI module does when working with Perl, Apache and mod_perl. Each time you disconnect from a database with $dbh-->disconnect, Apache::DBI silently ignores your request and keeps the connection around for the future. When you call DBI-->connect, Apache::DBI looks at the connection string and tries to reuse an existing connection before starting a new one. Since each Apache process services only one HTTP request at a time, each process thus needs only one database connection. The savings from this connect/disconnect sequence can be substantial. At the same time, it means that every child Apache process needs its own database connection, which can lead to dozens or hundreds of simultaneous connections on a heavily loaded server.

AOLServer cuts down on the number of database connections by using multiple threads rather than multiple processes. Because threads exist within the same process, they can share data. AOLServer takes advantage of this to create a small pool of database connections, choosing a connection at random and handing it to the thread handling an HTTP request as necessary. Database connections are not dedicated to a particular thread and can be shared as necessary, reducing the number of connections that the server must open with a database.

Working with Java servlets and JSPs requires a different model altogether. The Jakarta-Tomcat servlet/JSP implementation normally exists outside of a web server, meaning that they're always on Tomcat process, regardless of how many Apache child processes are on the system. Within that Tomcat process, there may be any number of servlet threads executing concurrently. Normally, servlets and JSPs (and Java beans, which JSPs and servlets can use to provide persistence and/or a high level of abstraction) connect to a database using JDBC. But JDBC does not automatically provide connection pooling; while JDBC 2.0 does provide this capability, it is not completely automatic, and not many JDBC 2.0 drivers exist as of this writing.

Other languages take a different tack. For example, database drivers for PHP allow persistent database connections but require that the programmer ask for them. That is, you can connect to a PostgreSQL database with pg_connect, or you can create a persistent connection to PostgreSQL with pg_pconnect. The onus is placed on the author of a database driver to provide two different access functions, and on the PHP programmer to use the appropriate function for his or her needs.

Of these, I find AOLServer's technique of persistent, pooled connections to be the most elegant, since it works for all languages --although that is almost always going to be Tcl—and scales extremely well. mod_perl's Apache::DBI is a great solution for Perl programs, especially since it means that individual Perl programs and modules do not need to be changed in order to take advantage of the persistent connections. The fact that Apache::DBI only provides persistence, and not pooling, is a direct result of Apache's multiple processes; it is probably safe to assume that Apache 2.0, which will support threads as well as processes, will come closer to AOLServer's model when it is released.

JDBC's pooling is good, particularly after it seemed that everyone was writing their own class for connection pooling. However, it will only work for Java servlets and will not help on a server that requires a pool for multiple services, such as mod_perl and JSP. PHP's system is perhaps the crudest because it provides neither a standard database API, nor a means for database drivers to pool connections automatically, nor a way for programs to take advantage of those connections. However, the persistence does work and can certainly result in a significant speedup.

Where Are We Going?

While I generally dislike the term “application server” for its ambiguity, it is clear that this is the direction in which the Web is moving. No longer will you design applications by writing one or more programs that exist on their own; rather, you will write a program using a set of objects and modules provided by the application server and into which your application fits naturally. In many cases, you can create relatively sophisticated applications with a minimum of work, simply because someone else has done the majority of the work for you.

Of course, this means that we're increasingly seeing operating systems as the underlying layer for an application server, where the latter is the truly important element. Just as a client-side application author must decide whether to write for Windows, UNIX or Macintosh, web application developers must increasingly decide which application server they prefer to use. As with operating systems, it is very difficult to move from one application server to another. This means, unfortunately, that choosing an immature, slow or difficult-to-modify server may be painful in the future. Even application servers that conform to the same standards and use the same language, such as Enhydra and ATG Dynamo, provide different objects and functionality and make it difficult to move from one to the other.

To a free software devotee like myself, this means that open-source application servers are at least as important as open-source operating systems. Luckily, there are a number of open-source application servers available for download from the Internet. They differ radically in their operation and functionality, but I must admit that I have had only a little exposure to each of the following technologies. While I hope to learn more about them in the coming months, I mention them because it is clear that web developers need to learn more about all of them.

Perhaps the best-known application server platform is Zope, which comes with a number of parts and isn't well understood. Zope is an object database, a templating system and even a basic content management system. I have not yet had a chance to play with Zope in a serious way, but the little that I have read and heard about it seems very impressive, particularly if a module is already available for the particular functionality you need.

Another application server that has been getting a lot of publicity is the ArsDigita Content System, written and maintained largely by the ArsDigita consulting company and released under the GNU Public License. One main problem with ACS has been its dependence on Oracle as a database; while Oracle is an excellent database product, it is both expensive and its source is quite closed. A volunteer effort known as OpenACS has been working to solve this problem by porting the ACS software to use PostgreSQL as a database. The software is not quite complete but does include a great deal of functionality and will undoubtedly improve over time.

XML has been a hot topic in the Web community for several years now, but only in the last six to nine months have we begun to see its widespread adoption. XML describes content semantically, completely ignoring the way in which it should be displayed.

Enhydra is a Java-based application server that seems similar to Zope in many ways, except that it works with XML, Java servlets, JSP and Enterprise Java Beans. Enhydra appears to be quite complex, but also provides a large framework on which to create applications.

If you want to work with XML, then you might also want to look at the Cocoon and AxKit projects. Cocoon, which is sponsored by the Apache Software Foundation, is working on a Java-based server for XML data. AxKit provides XML-based content generation using Perl, making it possible to separate programs from content, and content from graphic design, using XML, XSL and XSLT along with Perl.

Finally, I should mention Oracle's latest entry into the world of application servers, its Internet Application Server (IAS). IAS is a module within Apache that works with a Java run time system, Enterprise Java Beans, JSP and JDBC, along with Oracle. As of this writing, the system is largely new and untested. Of course, Oracle does not provide access to its source code. At the same time, IAS runs under Linux and may well be a popular choice among Oracle users and administrators.