Database Integrity and Web Applications
Want to improve the integrity of your data? Place constraints in the database, as well as in your application.
NoSQL, the catchall phrase for non-relational databases, is all the rage among Web developers. However, it's somewhat unfair and unhelpful to use the term NoSQL to describe them, given the variety of technologies involved. Even so, there are some fundamental differences between traditional relational databases and their NoSQL counterparts. For one, as the name implies, NoSQL databases don't use the standard SQL query language, and use either their own SQL-like language (for example, MongoDB) or an object-oriented API. Another difference is the lack of two-dimensional tables; whereas SQL databases operate solely with such tables, NoSQL databases eschew them in favor of name-value pairs or hash-like objects. And finally, NoSQL databases typically lack the features that led to the development of relational databases, namely transactions and data integrity.
There's no doubt that the flexibility NoSQL databases offer is attractive on a number of levels. Just as I enjoy working with dynamic languages in which I don't have to declare my variables (or their types) before I use them, it's nice to be able to store objects in my database without having to define the object structure in advance. If I want to add a new field to my Person object, I just do so, and the database magically catches up.
At the same time, there are many cases when I want the database to be tough with me and enforce the integrity of my data. That is, I want to be sure that even if I have made a mistake in my application, or if a user enters a value that shouldn't be allowed, the database won't allow that bad data to be stored. And yes, I believe that it's good to have such checks at the database level, rather than just at the application level—not only because it provides an additional guard against corrupt data, but also because the database often is accessed directly, outside the application itself. I often have to do what I refer to as "database surgery" on applications that are running for my clients, and it's always reassuring to know I cannot make a change manually that would corrupt the data.
Readers of my column know I'm a fan of both Ruby on Rails and of PostgreSQL, and I often use them together on projects. However, because Rails originally was developed under MySQL, which lacks many of the data-integrity aspects of PostgreSQL, the standard Active Record package fails to include such items as foreign keys in its base implementation. This means that although Rails will support PostgreSQL out of the box, it doesn't provide support for foreign keys, let alone data-integrity checks.
So, in this article, I look at some Ruby gems that make it possible to include foreign-key support in your Active Record models. I also describe ways you can take advantage of PostgreSQL's CHECK mechanism to ensure that your data is as safe as possible.
Where Do You Check Your Data?
Before I go into the actual code and implementation, I admit there are several philosophies of where and how to check your data. As I mentioned previously, I prefer to have the data checked at both the database and application levels, although sometimes I've been lazy and used the default in Rails that does so only at the application level.
Checking your data only at the application level is attractive in many ways. It makes the implementation easier, and it often will work just fine. It also, in the case of Rails, allows you to work in a single language (that is, Ruby) and put the integrity checks in your model files, where you're going to be using them.
But as I already said, if I have my checks only in the application, it's quite possible that when I try to access the database from outside my application, I'll end up messing things up, if only accidentally. Now, I'll admit this is the way many Web applications are designed, and it's not a fatal flaw. But it is something that I'll try to avoid in this article.
It would be a bigger mistake to try to do things the other way around—that is, to have the checks only in the database, rather than at the application level. Imagine, for example, if you were to have a "NOT NULL" constraint in the database, such that a particular column could not contain a NULL value. If you didn't protect against such things at the application layer, you might end up trying to put that data into your database. And although the database itself would not be corrupted, the application would generate an error, confusing and frustrating users.
With this background, let's now assume that I want to implement a simple appointment calendar. In order to implement my calendar, I need to keep track of people (each of whom has a first name, a last name, an e-mail address and a telephone number) and appointments (which will indicate the date/time of the appointment, the person with whom I'm meeting and a note about the meeting itself).
At the database level, the tables will be fairly straightforward. This is how I would create the tables if I were doing so manually:
CREATE TABLE People ( id SERIAL, first_name TEXT NOT NULL, last_name TEXT NOT NULL, email TEXT NOT NULL, phone TEXT NOT NULL, PRIMARY KEY(id) ); CREATE TABLE Appointments ( id SERIAL, person_id INTEGER REFERENCES People NOT NULL, notes TEXT NOT NULL, PRIMARY KEY(id) );
The important thing here, for the purposes of this discussion, is the
REFERENCES People clause in the definition of the Appointments
table. With the REFERENCES clause in place, I can do the following:
INSERT INTO People (first_name, last_name, email, phone) VALUES ('Reuven', 'Lerner', 'email@example.com', ↪'847-230-9795'); INSERT INTO Appointments (person_id, notes) VALUES (1, 'Meet with ↪myself');
Now, here's what happens if I try to delete myself from the People table:
atf=# DELETE FROM People WHERE id = 1; ERROR: update or delete on table "people" violates foreign key constraint "appointments_person_id_fkey" on table "appointments" DETAIL: Key (id)=(1) is still referenced from table "appointments".
Because Appointments.person_id is a foreign key to People.id, I cannot remove a row from People if Appointments refers to it. This sort of relational integrity is a great thing for my application. No matter how you slice it, it means I cannot remove people from the system if they are scheduled for an appointment.
If these tables were part of a Web application that I was developing in Ruby on Rails, I would use database migrations to create them. The migrations would look something like this (if combined into a single file):
class CreatePeopleAndAppointments < ActiveRecord::Migration def change create_table :people do |t| t.string :first_name, :null => false t.string :last_name, :null => false t.string :email, :null => false t.string :phone, :null => false t.timestamps end create_table :appointments do |t| t.integer :person_id, :null => false t.text :notes, :null => false t.timestamps end end end
Now, this will give me the table definitions, but it won't give me the foreign keys, meaning that I could corrupt the database by deleting a single "People" record. In order to do that, at least in the default Rails configuration, I need to use an "execute" command in the migration to send explicit SQL to the database.
Fortunately, there is an easier way. The Foreigner gem, written by
Matthew Higgins, which works with MySQL as well as with PostgreSQL,
adds syntax to let you create and remove foreign keys within your
migrations. For example, with Foreigner active—putting it in the
Gemfile and then running
bundle install—I can add a new migration
that does the following:
class AddForeignKey < ActiveRecord::Migration def up add_foreign_key :appointments, :people end def down remove_foreign_key :appointments, :people end end
Sure enough, after running this migration, I have the foreign key that I was hoping for. Again, it's still important for me to have this check in my Rails model; otherwise, I easily could get myself into a situation that the database forbids but the model allows and, thus, generate a bad error for my users.
Note: if you add the foreign key after you already have populated the database with some data, you might run into trouble. That's because PostgreSQL won't let you add a foreign key if one or more rows fail to abide by the declared constraint. The NOT NULL constraint additionally ensures that you point to some person in the system. In other words, every appointment has to be with someone, and it has to be with someone in the People table.
Practical books for the most technical people on the planet. Newly available books include:
- Agile Product Development by Ted Schmidt
- Improve Business Processes with an Enterprise Job Scheduler by Mike Diehl
- Finding Your Way: Mapping Your Network to Improve Manageability by Bill Childers
- DIY Commerce Site by Reven Lerner
Plus many more.
- Unikernels, Docker, and Why You Should Care
- Server Hardening
- diff -u: What's New in Kernel Development
- Controversy at the Linux Foundation
- 22 Years of Linux Journal on One DVD - Now Available
- Non-Linux FOSS: Snk
- Giving Silos Their Due
- Don't Burn Your Android Yet
- What's New in 3D Printing, Part III: the Software