At the Forge - PostgreSQL 9.0

Looking for a relational database? The latest version of PostgreSQL makes a great database even greater.
Advanced Features

One of the features I have grown to love over the years is the ability to write my own server-side functions—analogous to “stored procedures” in other databases—in a variety of languages. Most of the time, I use the built-in Pl/PgSQL language, which marries SQL queries with a procedural language. But in cases when I need a bit more power, or want to use existing libraries, I can write functions in PL/Perl, PL/Python or a number of other languages. Regardless of what language I use, these functions integrate seamlessly into the database, working no differently from built-in functions and allowing me to process data inside the database, rather than in my application.

Over time, these server-side languages have become increasingly sophisticated, and although the Pl/PgSQL syntax is not the most modern or expressive, it is easy to learn and handles a large number of cases easily and quickly. I've seen a wide variety of uses for server-side functions, from encapsulating most application logic in such functions to handling specific data-manipulation routines that don't logically belong in the application layer.

One of my favorite uses for functions is in triggers—actions the database executes automatically when a certain action occurs. For example, I once worked on a project into which we wanted to insert a URL, but in which we also wanted to have a (separate) column containing the hostname for that URL. I wrote a function that used regular expressions to retrieve the hostname and then inserted the hostname automatically into the appropriate column. From the application's perspective, it was able to insert a URL but then retrieve either a URL or a hostname. Triggers can be used to perform all sorts of actions before or after an insert, delete or update query is performed on a table.

One of the many advances in the 9.0 release was the improvement of “window functions”, functions introduced in 8.4 that make it possible to create aggregations (such as sum and average) over a portion of other rows in the table, rather than over the entire table. Thus, you could calculate the difference between someone's salary and other salaries in that person's department, or rank the number of citations a department has in a bibliographic index, or find which was the longest-running action in each controller, within an MVC Web application. I've only started to work with windowing functions, but the combination of built-in functionality from 8.4, plus some additions to 9.0 that make it easy to peek at preceding and following rows, lead me to believe this is a particularly powerful feature.

PostgreSQL 9.0

The above features are nice improvements, but they're icing on the cake when it comes to reasons for a full-version upgrade to 9.0.

First, this version makes it fairly simple to upgrade. Earlier versions required dumping the database to disk, upgrading the server, and then restoring the database from its dumped form. This might not be a big problem for small databases that can afford to be off-line for a while, but it is a major issue for large sites that cannot afford such a long shutdown period.

The core PostgreSQL developers heard the criticism and have responded with pg_upgrade. Now, pg_upgrade still is considered to be somewhat experimental, and it hasn't been given the official seal of approval by the development team, so it is placed in the contrib directory, rather than in any official location. However, pg_upgrade, which allows you to upgrade your PostgreSQL server without a dump or restore, is certainly one of the major innovations and accomplishments of 9.0, and it has been greeted with great excitement by people who were too nervous or busy to upgrade previously.

Another major feature—and perhaps the most impressive accomplishment of this version—is what's known as “hot streaming replication”. This feature is actually the combination of two different features, which work together to make for an amazing new backup and high-availability system.

The basic idea is as follows. When you commit a transaction to PostgreSQL, it doesn't immediately update the tables on disk. Rather, it writes a record of the transaction to a separate “write-ahead log”, or WAL, describing the change that should be made to the database. Only after enough WALs have accumulated does PostgreSQL actually update the data.

Starting in version 8.4, you could copy the WALs from one PostgreSQL server to another, typically using a program like rsync. When the WALs appeared on the second server, they were read into that system. If the first server were ever to go down, the second server could be put into active mode, picking up from where the first server had left off.

Although this was better than nothing, it had at least two problems. Not only was transferring the WALs via “log shipping” far from an ideal mechanism, but the secondary server could not be used while it was receiving data from the primary server.

Both of these problems have gone away in PostgreSQL 9.0, providing a robust and easy-to-configure master-slave mechanism for high availability. WALs now can be transferred from the primary server to the secondary using the network by configuring the secondary as a special kind of client program. As each transaction occurs, the WAL can be sent over the network to one or more remote servers, allowing them to be synchronized almost immediately with the primary. The other improvement is that the secondary server can answer read-only queries even while it's receiving data from the primary.

Hot streaming replication, as the combination of these two features is known, doesn't cover all situations. For example, I recently was working with a company that has multiple PostgreSQL servers whose data it would like to put onto a secondary server. For such purposes, hot streaming replication won't work. Fortunately, there are other solutions, such as Slony, that might be a better fit for the company's needs. But, for many sites that want to ensure their servers won't go down, this solution is more than adequate.