ATF Jubilee Edition

To celebrate his 50th column, Reuven casts his vision to the future of web development and suggests some current trends that will affect how the job gets done.
Pooled Database Connections

The third trend in the world of server-side programming is the issue of persistent database connections. Database servers were originally designed to handle one connection from each user every day, rather than once per minute or once per second. Consider this: if a CGI program connects to a relational database server once per second, you are exercising the connection mechanism more than 86,000 times what it was originally intended. In some cases, this does not mean very much, but there are many databases for which each connection is an expensive operation.

One solution is thus to open a database connection when the server first starts up and reuse that connection each time a program needs to contact the database. This is roughly what the Apache::DBI module does when working with Perl, Apache and mod_perl. Each time you disconnect from a database with $dbh-->disconnect, Apache::DBI silently ignores your request and keeps the connection around for the future. When you call DBI-->connect, Apache::DBI looks at the connection string and tries to reuse an existing connection before starting a new one. Since each Apache process services only one HTTP request at a time, each process thus needs only one database connection. The savings from this connect/disconnect sequence can be substantial. At the same time, it means that every child Apache process needs its own database connection, which can lead to dozens or hundreds of simultaneous connections on a heavily loaded server.

AOLServer cuts down on the number of database connections by using multiple threads rather than multiple processes. Because threads exist within the same process, they can share data. AOLServer takes advantage of this to create a small pool of database connections, choosing a connection at random and handing it to the thread handling an HTTP request as necessary. Database connections are not dedicated to a particular thread and can be shared as necessary, reducing the number of connections that the server must open with a database.

Working with Java servlets and JSPs requires a different model altogether. The Jakarta-Tomcat servlet/JSP implementation normally exists outside of a web server, meaning that they're always on Tomcat process, regardless of how many Apache child processes are on the system. Within that Tomcat process, there may be any number of servlet threads executing concurrently. Normally, servlets and JSPs (and Java beans, which JSPs and servlets can use to provide persistence and/or a high level of abstraction) connect to a database using JDBC. But JDBC does not automatically provide connection pooling; while JDBC 2.0 does provide this capability, it is not completely automatic, and not many JDBC 2.0 drivers exist as of this writing.

Other languages take a different tack. For example, database drivers for PHP allow persistent database connections but require that the programmer ask for them. That is, you can connect to a PostgreSQL database with pg_connect, or you can create a persistent connection to PostgreSQL with pg_pconnect. The onus is placed on the author of a database driver to provide two different access functions, and on the PHP programmer to use the appropriate function for his or her needs.

Of these, I find AOLServer's technique of persistent, pooled connections to be the most elegant, since it works for all languages --although that is almost always going to be Tcl—and scales extremely well. mod_perl's Apache::DBI is a great solution for Perl programs, especially since it means that individual Perl programs and modules do not need to be changed in order to take advantage of the persistent connections. The fact that Apache::DBI only provides persistence, and not pooling, is a direct result of Apache's multiple processes; it is probably safe to assume that Apache 2.0, which will support threads as well as processes, will come closer to AOLServer's model when it is released.

JDBC's pooling is good, particularly after it seemed that everyone was writing their own class for connection pooling. However, it will only work for Java servlets and will not help on a server that requires a pool for multiple services, such as mod_perl and JSP. PHP's system is perhaps the crudest because it provides neither a standard database API, nor a means for database drivers to pool connections automatically, nor a way for programs to take advantage of those connections. However, the persistence does work and can certainly result in a significant speedup.

Where Are We Going?

While I generally dislike the term “application server” for its ambiguity, it is clear that this is the direction in which the Web is moving. No longer will you design applications by writing one or more programs that exist on their own; rather, you will write a program using a set of objects and modules provided by the application server and into which your application fits naturally. In many cases, you can create relatively sophisticated applications with a minimum of work, simply because someone else has done the majority of the work for you.

Of course, this means that we're increasingly seeing operating systems as the underlying layer for an application server, where the latter is the truly important element. Just as a client-side application author must decide whether to write for Windows, UNIX or Macintosh, web application developers must increasingly decide which application server they prefer to use. As with operating systems, it is very difficult to move from one application server to another. This means, unfortunately, that choosing an immature, slow or difficult-to-modify server may be painful in the future. Even application servers that conform to the same standards and use the same language, such as Enhydra and ATG Dynamo, provide different objects and functionality and make it difficult to move from one to the other.

To a free software devotee like myself, this means that open-source application servers are at least as important as open-source operating systems. Luckily, there are a number of open-source application servers available for download from the Internet. They differ radically in their operation and functionality, but I must admit that I have had only a little exposure to each of the following technologies. While I hope to learn more about them in the coming months, I mention them because it is clear that web developers need to learn more about all of them.

Perhaps the best-known application server platform is Zope, which comes with a number of parts and isn't well understood. Zope is an object database, a templating system and even a basic content management system. I have not yet had a chance to play with Zope in a serious way, but the little that I have read and heard about it seems very impressive, particularly if a module is already available for the particular functionality you need.

Another application server that has been getting a lot of publicity is the ArsDigita Content System, written and maintained largely by the ArsDigita consulting company and released under the GNU Public License. One main problem with ACS has been its dependence on Oracle as a database; while Oracle is an excellent database product, it is both expensive and its source is quite closed. A volunteer effort known as OpenACS has been working to solve this problem by porting the ACS software to use PostgreSQL as a database. The software is not quite complete but does include a great deal of functionality and will undoubtedly improve over time.

XML has been a hot topic in the Web community for several years now, but only in the last six to nine months have we begun to see its widespread adoption. XML describes content semantically, completely ignoring the way in which it should be displayed.

Enhydra is a Java-based application server that seems similar to Zope in many ways, except that it works with XML, Java servlets, JSP and Enterprise Java Beans. Enhydra appears to be quite complex, but also provides a large framework on which to create applications.

If you want to work with XML, then you might also want to look at the Cocoon and AxKit projects. Cocoon, which is sponsored by the Apache Software Foundation, is working on a Java-based server for XML data. AxKit provides XML-based content generation using Perl, making it possible to separate programs from content, and content from graphic design, using XML, XSL and XSLT along with Perl.

Finally, I should mention Oracle's latest entry into the world of application servers, its Internet Application Server (IAS). IAS is a module within Apache that works with a Java run time system, Enterprise Java Beans, JSP and JDBC, along with Oracle. As of this writing, the system is largely new and untested. Of course, Oracle does not provide access to its source code. At the same time, IAS runs under Linux and may well be a popular choice among Oracle users and administrators.

______________________

White Paper
Fabric-Based Computing Enables Optimized Hyperscale Data Centers

Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions