Using PostgreSQL

 in
If you haven't already, perhaps it's time to see if PostgreSQL is the database for you.
Referential Integrity

So far, PostgreSQL does not seem to be very different from MySQL, except in some of its basic syntax. But, it is here that the two databases begin to diverge. PostgreSQL includes what are known as “referential integrity” checks, meaning that I can define certain values in a row cand be defined as illegal.

One of the first things that a database programmer learns is that null indicates that a column contains no value—not even false, the empty string or zero. We can forbid a column to contain “null” by declaring the column to be “not null”. This is perhaps the simplest integrity check, with the database ensuring that the column in question can never contain an illegal value. In our People table above, the name column is defined as not null, telling the database that every person must have a name.

Many times, however, this is not enough. For example, the People table includes an e-mail column. Modern e-mail addresses must contain an @ sign in order to be valid. Using referential integrity, we can ensure that any e-mail address added to the database contain an @ sign. In order to do this, we use PostgreSQL's Perl-like ~ operator, which matches a string with a regular expression:

email   VARCHAR(50)   NOT NULL   CHECK (contact_email ~ '@')

Without an @, an entered (or updated) value will be considered illegal and generate an error code. Having the database flag such an error might seem frustrating, but it is certainly better than having a database with incorrect values. I often use such checks to ensure that columns are not null and to forbid the empty string or other illegal values. For example, it is possible to define the People table with referential integrity checks (see Listing 3).

Listing 3

If I try to insert a row which violates any of these checks, PostgreSQL will refuse to do so (see Listing 4).

Listing 4

So it seems that my value for address2 was invalid. Indeed, I tried to enter an empty string here, which is not allowed. Rather, I should have entered a null value (remember, null and the empty string are distinct values). Sure enough, replacing the empty string with null allows the query to succeed. But the database refused to allow us to corrupt it with invalid information and did not insert the new row.

Foreign Keys

Referential integrity also means that one table can reliably point to another with “foreign keys”. Most tables have a primary key, meaning a column (or set of columns) that uniquely identifies each row. For example, the databases of the United States Social Security Administration uniquely identify each American citizen with a Social Security Number. This number means that you can change your name, address, phone number, bank accounts and job, but the same SSN will still refer to you. In the same way, a primary key allows us to continue pointing to a particular row in a table without having to depend on any of the values in that row.

For example, let's assume that we want to create an Appointments table containing three columns (see Listing 5).

Listing 5

The table in Listing 5 allows us to indicate when we are meeting with each person, ensuring that the notes column is either null or non-empty, and that only one appointment can take place at a time. The person_id column is supposed to contain the person_id from the People table. However, what stops me from entering a person_id of five or 50? How can I be sure that the value is valid?

The answer is that we can set person_id to be a “foreign key” of People, meaning that People.person_id can only contain a value that is contained in People.person_id. We can add this constraint with the REFERENCES keyword (see Listing 6).

Listing 6

Transactions

One of the chief complaints that PostgreSQL users level against MySQL is the lack of transactions. If you have only used MySQL in your development work, you may wonder why you need transactions and how they fit into a database environment.

The basic idea behind transactions is that they group multiple queries into a single logical query. If any of the queries in the transaction should fail, the database is “rolled back” to where things were before the transaction started.

Thanks to transactions, you can be sure that a transfer of money from one account to another will not accidentally leave you with too much or too little money, even if the power fails in the middle of the transaction. Until the transaction is finally “committed”, the database pretends that none of it has happened.

The MySQL documentation (and authors and support people) has its own philosophy of transactions, including commit and rollback, which runs counter to the “ACID” test that is prevalent among current relational databases. While I disagree with some of their conclusions, there is no doubt that the lack of conventional transactions in MySQL has made it a flexible and fast database, one that is well suited to web sites that perform many selects and few inserts and updates.

PostgreSQL follows the standard model fairly closely, making it possible to perform transactions without locking tables (as in the MySQL paradigm). To begin a transaction, use the being work statement, and to end one use either the commit or rollback statement. Users of Perl's DBI or of Java's JDBC can instead use the commit and rollback methods associated with the database connection object. Both DBI and JDBC operate by default in AutoCommit mode, meaning that each query is implicitly placed within its own transaction. To put several queries inside of a single transaction, a program must turn off AutoCommit mode, perform the transaction, perform a commit or rollback, and then (usually) turn AutoCommit mode back on. For example, let's assume that we have a separate Salaries table that indicates every employee's salary (see Listing 7).

Listing 7

Notice how the above table ensures that each employee can get only one raise on a given day, by setting a unique restriction on the combination of an employee and day.

We could presumably keep this information in the People table. However, putting salary information in a separate table makes it easier to hide the information from prying eyes. It also means that we can pull up an employee's entire salary history with a SELECT statement, while keeping the tables normalized and storing information about each person only once.

Having two tables for a single employee raises some issues. Most significantly, we want to be sure that any employee added to the People table will also be added to the Salaries table (it would be rather embarrassing to have an employee without any salary). Adding a new employee should be one logical operation but will require two insert statements—one into People and the other into Salaries. What happens if the database dies in the middle of the second statement?

Listing 8 contains a simple command line program (which could easily be turned into a CGI program) that creates a new employee, first adding a new row into the People table, and then adding a corresponding row to the Salaries table. We retrieve the person_id of the newly inserted employee using PostgreSQL's currval function. We then use that value to INSERT a row into the Salaries table.

Listing 8

We ensure that the two operations occur within a single transaction by turning AutoCommit mode off (setting it to 0). Once this happens, we are responsible for performing a commit when we are done. Without a call to $dbh->commit, PostgreSQL will assume that we want to roll back both of the insert operations, pretending that they never happened. If the program dies in the middle—which will happen if any of the SQL queries fails, since we have activated RaiseError—none of the changes will occur.

To trap the error and display a message to the user, we can use the block version of Perl's eval function, demonstrated in Listing 9. This runs the code inside of the {}, exiting from it, and setting the special variable $@ if anything goes wrong. This technique of using eval to trap errors gives us a basic form of exception handling and enables us to print out only the errors we want. If we were to activate either the PrintError attribute or the “use diagnostics” pragma, Perl would print out more than just our simple message, confusing the user.

Listing 9

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix