Using PostgreSQL

 in
If you haven't already, perhaps it's time to see if PostgreSQL is the database for you.
Referential Integrity

So far, PostgreSQL does not seem to be very different from MySQL, except in some of its basic syntax. But, it is here that the two databases begin to diverge. PostgreSQL includes what are known as “referential integrity” checks, meaning that I can define certain values in a row cand be defined as illegal.

One of the first things that a database programmer learns is that null indicates that a column contains no value—not even false, the empty string or zero. We can forbid a column to contain “null” by declaring the column to be “not null”. This is perhaps the simplest integrity check, with the database ensuring that the column in question can never contain an illegal value. In our People table above, the name column is defined as not null, telling the database that every person must have a name.

Many times, however, this is not enough. For example, the People table includes an e-mail column. Modern e-mail addresses must contain an @ sign in order to be valid. Using referential integrity, we can ensure that any e-mail address added to the database contain an @ sign. In order to do this, we use PostgreSQL's Perl-like ~ operator, which matches a string with a regular expression:

email   VARCHAR(50)   NOT NULL   CHECK (contact_email ~ '@')

Without an @, an entered (or updated) value will be considered illegal and generate an error code. Having the database flag such an error might seem frustrating, but it is certainly better than having a database with incorrect values. I often use such checks to ensure that columns are not null and to forbid the empty string or other illegal values. For example, it is possible to define the People table with referential integrity checks (see Listing 3).

Listing 3

If I try to insert a row which violates any of these checks, PostgreSQL will refuse to do so (see Listing 4).

Listing 4

So it seems that my value for address2 was invalid. Indeed, I tried to enter an empty string here, which is not allowed. Rather, I should have entered a null value (remember, null and the empty string are distinct values). Sure enough, replacing the empty string with null allows the query to succeed. But the database refused to allow us to corrupt it with invalid information and did not insert the new row.

Foreign Keys

Referential integrity also means that one table can reliably point to another with “foreign keys”. Most tables have a primary key, meaning a column (or set of columns) that uniquely identifies each row. For example, the databases of the United States Social Security Administration uniquely identify each American citizen with a Social Security Number. This number means that you can change your name, address, phone number, bank accounts and job, but the same SSN will still refer to you. In the same way, a primary key allows us to continue pointing to a particular row in a table without having to depend on any of the values in that row.

For example, let's assume that we want to create an Appointments table containing three columns (see Listing 5).

Listing 5

The table in Listing 5 allows us to indicate when we are meeting with each person, ensuring that the notes column is either null or non-empty, and that only one appointment can take place at a time. The person_id column is supposed to contain the person_id from the People table. However, what stops me from entering a person_id of five or 50? How can I be sure that the value is valid?

The answer is that we can set person_id to be a “foreign key” of People, meaning that People.person_id can only contain a value that is contained in People.person_id. We can add this constraint with the REFERENCES keyword (see Listing 6).

Listing 6

Transactions

One of the chief complaints that PostgreSQL users level against MySQL is the lack of transactions. If you have only used MySQL in your development work, you may wonder why you need transactions and how they fit into a database environment.

The basic idea behind transactions is that they group multiple queries into a single logical query. If any of the queries in the transaction should fail, the database is “rolled back” to where things were before the transaction started.

Thanks to transactions, you can be sure that a transfer of money from one account to another will not accidentally leave you with too much or too little money, even if the power fails in the middle of the transaction. Until the transaction is finally “committed”, the database pretends that none of it has happened.

The MySQL documentation (and authors and support people) has its own philosophy of transactions, including commit and rollback, which runs counter to the “ACID” test that is prevalent among current relational databases. While I disagree with some of their conclusions, there is no doubt that the lack of conventional transactions in MySQL has made it a flexible and fast database, one that is well suited to web sites that perform many selects and few inserts and updates.

PostgreSQL follows the standard model fairly closely, making it possible to perform transactions without locking tables (as in the MySQL paradigm). To begin a transaction, use the being work statement, and to end one use either the commit or rollback statement. Users of Perl's DBI or of Java's JDBC can instead use the commit and rollback methods associated with the database connection object. Both DBI and JDBC operate by default in AutoCommit mode, meaning that each query is implicitly placed within its own transaction. To put several queries inside of a single transaction, a program must turn off AutoCommit mode, perform the transaction, perform a commit or rollback, and then (usually) turn AutoCommit mode back on. For example, let's assume that we have a separate Salaries table that indicates every employee's salary (see Listing 7).

Listing 7

Notice how the above table ensures that each employee can get only one raise on a given day, by setting a unique restriction on the combination of an employee and day.

We could presumably keep this information in the People table. However, putting salary information in a separate table makes it easier to hide the information from prying eyes. It also means that we can pull up an employee's entire salary history with a SELECT statement, while keeping the tables normalized and storing information about each person only once.

Having two tables for a single employee raises some issues. Most significantly, we want to be sure that any employee added to the People table will also be added to the Salaries table (it would be rather embarrassing to have an employee without any salary). Adding a new employee should be one logical operation but will require two insert statements—one into People and the other into Salaries. What happens if the database dies in the middle of the second statement?

Listing 8 contains a simple command line program (which could easily be turned into a CGI program) that creates a new employee, first adding a new row into the People table, and then adding a corresponding row to the Salaries table. We retrieve the person_id of the newly inserted employee using PostgreSQL's currval function. We then use that value to INSERT a row into the Salaries table.

Listing 8

We ensure that the two operations occur within a single transaction by turning AutoCommit mode off (setting it to 0). Once this happens, we are responsible for performing a commit when we are done. Without a call to $dbh->commit, PostgreSQL will assume that we want to roll back both of the insert operations, pretending that they never happened. If the program dies in the middle—which will happen if any of the SQL queries fails, since we have activated RaiseError—none of the changes will occur.

To trap the error and display a message to the user, we can use the block version of Perl's eval function, demonstrated in Listing 9. This runs the code inside of the {}, exiting from it, and setting the special variable $@ if anything goes wrong. This technique of using eval to trap errors gives us a basic form of exception handling and enables us to print out only the errors we want. If we were to activate either the PrintError attribute or the “use diagnostics” pragma, Perl would print out more than just our simple message, confusing the user.

Listing 9

______________________

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState