At the Forge - PostgreSQL 9.0

Looking for a relational database? The latest version of PostgreSQL makes a great database even greater.

If you want to build a Web application, you're going to need a few pieces of software. You'll need a server operating system, such as Linux. You'll also need a server-side language and framework. Although I personally use Ruby on Rails nowadays, plenty of other good choices exist. For client-side interactions, you'll need to use a JavaScript framework, such as jQuery. And, to store information about (and for) your users, you'll need to select a persistent storage mechanism, otherwise known as a database.

Until just a few years ago, anyone using the term database almost always was referring to a relational database—that is, one based on linked two-dimensional tables, which could be queried using the relatively standard SQL language. Sure, there were a variety of other types of databases, but they weren't in widespread use or taken very seriously.

In the past few years, however, there has been tremendous growth in the use and mindshare of so-called NoSQL databases. There's not much in common between these various databases—from Redis to MongoDB to CouchDB to Hadoop—that offer the promise of greater scalability and flexibility.

Now, I'm not one to say these non-relational databases are bad. On the contrary, I've used several of them in my own projects and found them to be a perfect fit for certain needs. But for day-to-day use, I continue to find relational databases an excellent fit. And when I need a relational database, I always prefer to use PostgreSQL. Indeed, when I'm working on a Web application, PostgreSQL is the default option. Only if I see that it won't work well (for whatever reason) do I switch some or all of the project to use a different database.

Why do I love PostgreSQL so much? First, because it's rock solid. The developers have an almost obsessive dedication to the safety of the data you store in the database, and they do their best to ensure that data won't ever be deleted or corrupted. This doesn't mean such situations never happen, but they're quite rare. PostgreSQL not only tries to ensure the safety of your data, but it also provides you with a variety of tools to help you constrain the values that may be stored in the database, ensuring that anything stored is guaranteed to be valid.

Second, PostgreSQL offers a wealth of features that never cease to impress and amaze me. Whether it's the number of languages in which you can write server-side functions or the different ways you can define indexes, the clever use of MVCC (multi-version concurrency control) to avoid writers blocking readers or the ongoing statistical analysis that feeds into the query optimizer, PostgreSQL hasn't failed me in more than 15 years of day-to-day use.

Every time I use another relational database, I find myself wishing for one or more functions that PostgreSQL provides or thinking that PostgreSQL wouldn't even enter into certain problematic situations, because of its high-quality architecture. This doesn't mean PostgreSQL is perfect, but its mix of features has served me quite well, and I'm not the only one to feel this way.

This month, I want to look at PostgreSQL 9.0, which was released in late 2010. In particular, I want to consider what advantages it offers over other open-source databases (mainly MySQL). For people already using PostgreSQL, what new features does the 9.0 release bring to the table?

I won't describe installing PostgreSQL, simply because versions are available for all major Linux distributions. Downloading and compiling the source code is relatively straightforward—although doing so as the dedicated “postgres” user, rather than as root, will save you many headaches during the installation process—but I have found binaries to be just fine for my day-to-day needs.

Background

PostgreSQL is an open-source (BSD-licensed) relational database developed by a core group of developers working for a number of different companies. (Indeed, the rules of the core development group forbid more than a handful of core developers from working for the same company to ensure that no one organization controls development directly.) It grew out of the Postgres Project developed by Michael Stonebreaker at the University of California, Berkeley. Postgres itself was an attempt to improve upon the original Ingres database Stonebreaker had developed and commercialized.

Postgres used its own query language, known as QUEL, and had a number of advanced ideas, including many taken from the world of object-oriented programming. Several developers joined forces to take the Postgres database, replace QUEL with SQL, stabilize the code and release it under an open-source license. The first release of PostgreSQL, under its new name, came in 1996. Informally, a large number of PostgreSQL users and developers continue to call it Postgres, although that technically refers to Stonebreaker's project at Berkeley and not the current implementation.

One of the main goals of the PostgreSQL team has been to adhere to the SQL standard wherever possible. In addition, as I mentioned previously, PostgreSQL developers pride themselves on a rock-solid implementation, using the standard ACID (atomicity-consistency-isolation-durability) paradigm for database storage under all circumstances. They further try to balance between a powerful set of features and configurable options and overwhelming people who don't want to be full-time database administrators.

All PostgreSQL storage is transactional, using a system known as MVCC (multi-version concurrency control). MVCC, which also is used in Oracle and CouchDB, largely avoids conflicts between readers and writers, ensuring that neither rows nor tables need to be locked under most circumstances. MVCC does have the side effect of keeping “dead rows” around in the database, which occasionally need to be “vacuumed”, similar to garbage collection in many programming languages. For many years, vacuuming needed to be done manually, typically by running the vacuum command from a cron job. Nowadays, the autovacuum dæmon runs in the background, marking space as reusable when added or updated rows reach a configurable threshold.

Vacuum also can be run in “analyze” mode, in which case it examines tables and indexes, updating the statistics that are used when planning queries. This is one of the reasons PostgreSQL is able to operate so efficiently, even when handed complex queries. By keeping an up-to-date description of data stored in each table, it can make an intelligent decision regarding how to optimize the query. If the guesses turn out to be wrong for your particular needs, you can configure some configuration parameters.

PostgreSQL offers strong support for foreign keys, making it possible to ensure that connections between tables exist and work in the right way. That is, you can define a People table and then an Addresses table that refers to it:

CREATE TABLE People (
id SERIAL,
first_name TEXT,
last_name TEXT,
email_address TEXT,
PRIMARY KEY(id)
);

CREATE TABLE Addresses (
id SERIAL,
person_id INTEGER REFERENCES People,
address TEXT,
PRIMARY KEY(id)
);

Now, let's try to insert a new record into the Addresses table, without inserting anything into People first:

INSERT INTO Addresses (person_id) VALUES (5);

This results in the following:

ERROR:  insert or update on table "addresses" violates foreign key
 ↪constraint "addresses_person_id_fkey"
DETAIL:  Key (person_id)=(5) is not present in table "people".

Because I have defined person_id as a foreign key, PostgreSQL won't let me have an invalid value there, period. PostgreSQL also will refuse to let you remove a record from the People table if there are rows in the Addresses table referring to it. You can override such restrictions with the CASCADE keyword, but the database never will remove things on its own without your explicit approval.

You can, of course, also ensure that e-mail addresses are unique in your People table:

CREATE UNIQUE INDEX people_email_idx ON People(email_address);

But wait, what if someone enters an e-mail address in capital letters and then enters the same e-mail address in lowercase letters? You can ensure uniqueness by taking advantage of one of my favorite PostgreSQL features, a functional index:

CREATE UNIQUE INDEX people_email_idx ON 
 ↪People(lower(email_address));

Now PostgreSQL will ensure its uniqueness, regardless of the case. This feature also comes in handy if you're trying to index a column containing long text strings. You can index only the first 1,000 characters:

CREATE UNIQUE INDEX people_email_idx ON 
 ↪People(substring(email_address, 1, 1000));

Finally, another powerful feature is CTEs (common table expressions). If you often find yourself performing a query, putting the results in a temporary table and then querying that temp table, CTEs almost certainly are something you'll want to use. You basically describe the temporary table query, give it a name using the “with” statement, and then query that pseudo table. For example:

WITH Mini_users
AS (SELECT id * 2 AS id2, email FROM Users)
SELECT id2 FROM Mini_users;

In the example above, I create a new temporary table named mini_users and define it, and then query it as if it already existed. I already have found CTEs to be quite useful in simplifying complex queries. PostgreSQL also makes it easy to define recursive CTEs, allowing you to work with hierarchical data with a single, simple query.

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState