Open-Source Databases, Part II: PostgreSQL
If you're developing an application that depends on storing and retrieving large amounts of data, you undoubtedly have discovered how useful a relational database can be. Databases have been around for a while, but for many years, the open-source offerings were inferior to their commercial counterparts. Today, we have several open-source options from which to choose. Last month, I gave an overview of MySQL, which is probably the most popular open-source, client-server relational database.
This month, we look at MySQL's best-known competitor, PostgreSQL. (Next month, I plan to compare the two programs.) PostgreSQL has a smaller community than MySQL, which sometimes leads people to write it off, think of it as unimportant or even believe that it's not as powerful as MySQL. But, PostgreSQL includes nearly all the functionality of MySQL, and it offers a large number of features that MySQL does not currently offer. As this article shows, it's worth considering PostgreSQL any time you need a database.
PostgreSQL began as a research project at the University of California, Berkeley, in 1985. Michael Stonebreaker, a professor of computer science, already had developed and released one database product known as Ingres. After commercializing Ingres, he returned to Berkeley and designed a new database (Postgres), meant in part to solve the problems he had seen with Ingres. Despite a growing community of users, Stonebreaker shut down the Postgres project in 1993.
However, Postgres was distributed under the BSD open-source license, which meant users were free to modify and redistribute it. Two Berkeley students, along with a handful of people from elsewhere in North America, decided to see how easily they could modify the code and distribute a working database.
One of the changes these developers made was in the way Postgres communicated to the outside world. Originally, Postgres had used the QUEL query language that Stonebreaker had designed for Ingres. The developers removed QUEL and added an SQL interpreter to be more compatible with other products. To indicate that this version of Postgres used SQL, the database was renamed PostgreSQL.
Today, many of those original developers continue to work with the project, fixing bugs and contributing patches. The effort is coordinated by a volunteer steering committee. Individual PostgreSQL developers may be hired (as employees or contractors) by various companies, but no one company or organization controls the development or direction of PostgreSQL as a whole.
The most recent release of PostgreSQL is 8.2.3, released in early February 2007. This was a bug-fix release for version 8.2, which was released in late 2006. Major releases typically come out once per year, with additional minor releases for security and other serious bugs.
The easiest way to install PostgreSQL is to use a packaging system, such as debs or RPMs. On my Ubuntu system, for example, I was able to install PostgreSQL easily and quickly with apt-get. Note that most packaging systems distinguish between the PostgreSQL client, server and developer libraries, so be sure to retrieve the packages that are most appropriate for your needs. Installing PostgreSQL via apt-get or RPMs should achieve everything you need to get started—from creating a postgres user to initializing a data directory.
If you must install PostgreSQL from source, I suggest reading the documentation that comes with it. Compiling PostgreSQL is not hard, but it does require more description than I have room for in this article. One piece of advice though, for anyone compiling it from scratch, is to unpack the archive, as well as compile and test it, as the postgres user. Trying to compile and test PostgreSQL as the root user is bound to fail, and other users also might not have sufficient privileges for it to work correctly.
Now, we're ready to start up the server. Prebuilt packages generally will include a shell script (to be placed in /etc/init.d/ or the equivalent) that starts the server for you. Even if you have downloaded and installed the source code for PostgreSQL, you will find an appropriate startup script in the contrib directory. I suggest using (or at least modifying) this script rather than writing one from scratch. On my system, I can start PostgreSQL with:
If all goes well, our server should now be running. (We can check this by typing ps aux | grep postgres at the command line.) The easiest way to access the server is to use the psql interactive client, which comes with PostgreSQL. To get a list of databases in the current cluster, use the following syntax:
psql -U <username> -l