Database Integrity and Web Applications

Want to improve the integrity of your data? Place constraints in the database, as well as in your application.

NoSQL, the catchall phrase for non-relational databases, is all the rage among Web developers. However, it's somewhat unfair and unhelpful to use the term NoSQL to describe them, given the variety of technologies involved. Even so, there are some fundamental differences between traditional relational databases and their NoSQL counterparts. For one, as the name implies, NoSQL databases don't use the standard SQL query language, and use either their own SQL-like language (for example, MongoDB) or an object-oriented API. Another difference is the lack of two-dimensional tables; whereas SQL databases operate solely with such tables, NoSQL databases eschew them in favor of name-value pairs or hash-like objects. And finally, NoSQL databases typically lack the features that led to the development of relational databases, namely transactions and data integrity.

There's no doubt that the flexibility NoSQL databases offer is attractive on a number of levels. Just as I enjoy working with dynamic languages in which I don't have to declare my variables (or their types) before I use them, it's nice to be able to store objects in my database without having to define the object structure in advance. If I want to add a new field to my Person object, I just do so, and the database magically catches up.

At the same time, there are many cases when I want the database to be tough with me and enforce the integrity of my data. That is, I want to be sure that even if I have made a mistake in my application, or if a user enters a value that shouldn't be allowed, the database won't allow that bad data to be stored. And yes, I believe that it's good to have such checks at the database level, rather than just at the application level—not only because it provides an additional guard against corrupt data, but also because the database often is accessed directly, outside the application itself. I often have to do what I refer to as "database surgery" on applications that are running for my clients, and it's always reassuring to know I cannot make a change manually that would corrupt the data.

Readers of my column know I'm a fan of both Ruby on Rails and of PostgreSQL, and I often use them together on projects. However, because Rails originally was developed under MySQL, which lacks many of the data-integrity aspects of PostgreSQL, the standard Active Record package fails to include such items as foreign keys in its base implementation. This means that although Rails will support PostgreSQL out of the box, it doesn't provide support for foreign keys, let alone data-integrity checks.

So, in this article, I look at some Ruby gems that make it possible to include foreign-key support in your Active Record models. I also describe ways you can take advantage of PostgreSQL's CHECK mechanism to ensure that your data is as safe as possible.

Where Do You Check Your Data?

Before I go into the actual code and implementation, I admit there are several philosophies of where and how to check your data. As I mentioned previously, I prefer to have the data checked at both the database and application levels, although sometimes I've been lazy and used the default in Rails that does so only at the application level.

Checking your data only at the application level is attractive in many ways. It makes the implementation easier, and it often will work just fine. It also, in the case of Rails, allows you to work in a single language (that is, Ruby) and put the integrity checks in your model files, where you're going to be using them.

But as I already said, if I have my checks only in the application, it's quite possible that when I try to access the database from outside my application, I'll end up messing things up, if only accidentally. Now, I'll admit this is the way many Web applications are designed, and it's not a fatal flaw. But it is something that I'll try to avoid in this article.

It would be a bigger mistake to try to do things the other way around—that is, to have the checks only in the database, rather than at the application level. Imagine, for example, if you were to have a "NOT NULL" constraint in the database, such that a particular column could not contain a NULL value. If you didn't protect against such things at the application layer, you might end up trying to put that data into your database. And although the database itself would not be corrupted, the application would generate an error, confusing and frustrating users.

So, upsetting and annoying as it might be, the best way to do things probably is to duplicate some work—adding the constraints on the database and then having them on the application as well. You could argue that you should have the constraints in place in a third location—in the HTML form the user often will use to submit information to the server. Fortunately, there is at least one technique for handling this sort of thing automatically. The client_side_validations gem for Rails copies the validations as best as possible, putting them into JavaScript within the user's views.

Foreign Keys

With this background, let's now assume that I want to implement a simple appointment calendar. In order to implement my calendar, I need to keep track of people (each of whom has a first name, a last name, an e-mail address and a telephone number) and appointments (which will indicate the date/time of the appointment, the person with whom I'm meeting and a note about the meeting itself).

At the database level, the tables will be fairly straightforward. This is how I would create the tables if I were doing so manually:


CREATE TABLE People (
    id SERIAL,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    email TEXT NOT NULL,
    phone TEXT NOT NULL,
    PRIMARY KEY(id)
);

CREATE TABLE Appointments (
    id SERIAL,
    person_id INTEGER REFERENCES People NOT NULL,
    notes TEXT NOT NULL,
    PRIMARY KEY(id)
);

The important thing here, for the purposes of this discussion, is the REFERENCES People clause in the definition of the Appointments table. With the REFERENCES clause in place, I can do the following:


INSERT INTO People (first_name, last_name, email, phone)
    VALUES ('Reuven', 'Lerner', 'reuven@lerner.co.il',
    ↪'847-230-9795');
INSERT INTO Appointments (person_id, notes) VALUES (1, 'Meet with
 ↪myself');

Now, here's what happens if I try to delete myself from the People table:


atf=# DELETE FROM People WHERE id = 1;
ERROR:  update or delete on table "people" violates foreign key
constraint "appointments_person_id_fkey" on table "appointments"
DETAIL:  Key (id)=(1) is still referenced from table "appointments".

Because Appointments.person_id is a foreign key to People.id, I cannot remove a row from People if Appointments refers to it. This sort of relational integrity is a great thing for my application. No matter how you slice it, it means I cannot remove people from the system if they are scheduled for an appointment.

Rails Integration

If these tables were part of a Web application that I was developing in Ruby on Rails, I would use database migrations to create them. The migrations would look something like this (if combined into a single file):


class CreatePeopleAndAppointments < ActiveRecord::Migration

  def change
    create_table :people do |t|
      t.string :first_name, :null => false
      t.string :last_name, :null => false
      t.string :email, :null => false
      t.string :phone, :null => false

      t.timestamps
    end

    create_table :appointments do |t|
      t.integer :person_id, :null => false
      t.text :notes, :null => false

      t.timestamps
    end
  end
end

Now, this will give me the table definitions, but it won't give me the foreign keys, meaning that I could corrupt the database by deleting a single "People" record. In order to do that, at least in the default Rails configuration, I need to use an "execute" command in the migration to send explicit SQL to the database.

Fortunately, there is an easier way. The Foreigner gem, written by Matthew Higgins, which works with MySQL as well as with PostgreSQL, adds syntax to let you create and remove foreign keys within your migrations. For example, with Foreigner active—putting it in the Gemfile and then running bundle install—I can add a new migration that does the following:


class AddForeignKey < ActiveRecord::Migration
  def up
    add_foreign_key :appointments, :people
  end

  def down
    remove_foreign_key :appointments, :people
  end
end

Sure enough, after running this migration, I have the foreign key that I was hoping for. Again, it's still important for me to have this check in my Rails model; otherwise, I easily could get myself into a situation that the database forbids but the model allows and, thus, generate a bad error for my users.

Note: if you add the foreign key after you already have populated the database with some data, you might run into trouble. That's because PostgreSQL won't let you add a foreign key if one or more rows fail to abide by the declared constraint. The NOT NULL constraint additionally ensures that you point to some person in the system. In other words, every appointment has to be with someone, and it has to be with someone in the People table.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

rewritten for non-empty

Anonymous's picture

INSERT INTO Appointments (person_id, notes)
SELECT (people.id, 'Meet with myself')
FROM people
WHERE (first_name, last_name)
IN ('Reuven', 'Lerner');

Question about your SQL

Don Barthel's picture

In the second INSERT ("INSERT INTO Appointments") you assume that the person_id is "1", the key assigned by Postgres in the first INSERT ("INSERT INTO People").

Correct me if I'm wrong but you would only know that if you ran your queries against an empty database.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix