At the Forge - Databases and Calendars
Last month, we continued our look at the iCalendar standard, which makes it possible for programs to exchange calendar and appointment information. As we saw, an iCalendar file contains one or more events and tasks. If we make the file available via an HTTP server such as Apache, we can distribute it to anyone with an iCalendar-compatible program, such as Mozilla's Sunbird. As we saw last month, we can go one step further than this, generating an iCalendar file dynamically, using a CGI program.
Although the programs I presented and discussed last month might be useful in a limited context, it should be clear to any Web developer that keeping the date and time information inside of a program would be foolish, to say the least.
One of the best ways to keep track of such data is in a relational database such as PostgreSQL. A relational database allows you to ensure that the data you have entered is valid and provides you with fast, flexible access to some or all of the data it contains. Moreover, by storing the calendar information inside of a database, you can create multiple versions of the same calendar file, using the same source.
This month, we look at a simple example of a Web-based program that takes calendar information from a relational database and uses it to generate an iCalendar data file, which then can be imported into iCalendar-compliant programs, such as Mozilla's Sunbird.
If we are going to store our calendar information in a relational database, we need to define at least one table. This is because everything in a relational database—often including configuration and status information—is stored in a two-dimensional table, in which the columns define individual fields, and each row contains one record. For example, here is how we might define a simple table of events in PostgreSQL:
CREATE TABLE Events ( event_id SERIAL NOT NULL, event_summary TEXT NOT NULL CHECK (event_summary <> ''), event_location TEXT NOT NULL CHECK (event_location <> ''), event_start TIMESTAMP NOT NULL, event_end TIMESTAMP NOT NULL, event_timestamp TIMESTAMP NOT NULL DEFAULT NOW(), PRIMARY KEY(event_id) );
The above table contains six columns. The first, event_id, is defined to be of type SERIAL. If we don't explicitly provide a value for event_id when adding a row to the table, PostgreSQL retrieves a new integer value automatically, up to a maximum of 231. PostgreSQL allows you to set a larger ceiling to allow the sequence to wrap around to 1, or both; see the documentation for more details.
The event_id column uniquely identifies rows in our table, and we tell the database this by marking it as a PRIMARY KEY. This not only tells other database programmers which column will be used for retrieving records, but it ensures that values are unique and that the column is indexed as well.
Another automatically populated column is event_timestamp. From the definition, it might appear as though we can (and will) set event_timestamp to an explicit value, with the current time providing a default as necessary. But whenever I define a column in this way, it implies that I never expect to set a value explicitly for this column. Rather, I am interested in letting PostgreSQL set the column's value with the current date and time.
Notice how the event_summary and event_location columns are both defined to be of type TEXT (that is, infinite-length text fields), while event_start, event_end and event_timestamp are all of type TIMESTAMP, the SQL-standard way of saying date and time.
All of the columns in this table are defined to be NOT NULL, meaning that they may not be assigned SQL's undefined value of NULL. NULL is distinct from true and false, which can make it a bit tricky for newcomers to understand. However, if you think of NULL as representing an unknown or undefined value, it might become clearer. As useful as NULLs can be in distinguishing between false and unknown values, it's usually a good idea to cut down on them as much as possible. Indeed, the advice that I have long heard, and repeated to others, is that you should define columns to be NOT NULL by default, opening them up to NULL values as the situation requires.
Finally, notice how our two text columns (event_summary and event_location) are defined both as NOT NULL and with an integrity check that ensures we enter something other than an empty string. Whether this combination of constraints is appropriate depends on your database needs. You might want to think about whether you want to allow NULL values at all and also if you want to allow empty strings for the summary and location columns.
Although this simple definition is meant to serve as an example, consider how much better it would be if we were to have a separate Locations table, with a location_id and location_name, and then replace the textual event_location column with a location_id. This would have the advantage of standardizing location names, which would lead to fewer inconsistencies. It also would allow us to search for all of the events taking place in a particular location.
Once we are done defining the table, we add some indexes to our table. Each index ensures that data will be retrieved from the table more quickly than usual, at the expense of additional time for each INSERT. Here are the definitions:
CREATE INDEX event_location_idx ON Events(event_location); CREATE INDEX event_start_idx ON Events(event_start); CREATE INDEX event_end_idx ON Events(event_end);