PostgreSQL—the Linux under the Databases

A close look at the PostgreSQL database, including programming interfaces and using it for WWW applications.
Short Introduction to SQL

The best way to get started with PostgreSQL is to read the user manual. It is written for Postgres95 v1.0 and dates back to September 1995, but the information provided there is still useful. Other valuable sources of information are the tutorials, the man pages and the various files in the /doc directory. I will be using the examples from the tutorial in this article whenever possible.

To begin, create a database, i.e., a named container for several tables and accompanying data such as indices and views. To use the database named tutorial, type this command:

createdb tutorial

Now, you are automatically the owner of the database and have full access to it. Other users have access only if you grant them the appropriate rights.

Next, connect that database to a client program. We will use psql, which comes with PostgreSQL. If you prefer a graphical interface, there is also a Motif client available called mpsql (with pre-compiled binaries for Linux). mpsql has great editing capabilities but doesn't provide special local commands for listing existing tables and databases. It also lacks a help system that is available in psql. To use psql, type:

psql tutorial

This command provides you with a shell-like environment in which you can issue SQL commands. Due to the Readline support, you have command history and file name completion with the same key bindings as the bash shell. You can also enter local commands which are processed by the client first if you prefix them with a backslash. Enter \? to get a list of local commands. All other commands are sent directly to the back end. Commands can be typed on separate lines. They will be stored in a local buffer until you enter a line terminated with a ; (semicolon), then the buffer is sent to the back end. Help on SQL commands is available by typing \h.

SQL commands can be passed on the command line for shell scripting. The \i command reads a file from disk and executes its contents as SQL commands. Be sure to always use absolute pathnames with psql, because the back end knows nothing about the current working directory of the client.

Next, create the following two tables:

create table cities (
  name  text,
  population    float8,
  altitude      int--this is a comment
create table capitals (
  state char2
) inherits (cities);

The text type is a string of characters of variable length. If you enter int, you get a four-byte integer value. PostgreSQL comes with 43 predefined data types including several types for time and date values, many types for geometrical objects such as point, circle and polygon and a boolean type. Arrays are also supported. All types are described in the pgbuiltin(l) manual page. If you need additional types, you can add your own. Note that identifier names have not been case sensitive since v6.1.

This example also illustrates a special feature of PostgreSQL: object inheritance. The second table inherits all the fields from the cities table and adds one more field. Later I'll show how to take advantage of that feature.

One often needed feature is missing from PostgreSQL, that is, the ability to define primary keys in the create clause. Primary keys are used to define a default sort order for the tuples and to ensure that the field with that key can't hold duplicate values. It is not supported due to the method used to store the records (or tuples). Every tuple in the database gets a unique object identifier (oid) value, which is unique not only in the table but also in the whole database. There is no way to guarantee a specific order in the table. As of version 6.0 of PostgreSQL it is possible to create a unique index, so that the same effect can be achieved with indices. To create a unique index for our example table, type:

create unique index on cities
using btree (name);

There are three methods available for index creation: btree, rtree and hash. The method can be specified after the keyword “using”. Only the btree method allows multiple key indices with up to seven keys. Note that not all data types are supported by all index types. In particular, rtree indices are available only for geometric types. If no index type is specified, btree is used as the default. Indices increase the access speed to tables significantly and should be used whenever possible.

The maximum size of a tuple is 8192 bytes. In reality, it is somewhat smaller, because PostgreSQL needs some place for storing internal data. The amount of this space varies from platform to platform. If you need larger fields, use the large objects interface, which provides unlimited fields of transparent data, like MEMO or BLOB fields in other databases; however, you need special functions to access them.

To enter data in the tables, use the insert command:

INSERT INTO cities VALUES ('San Francisco',
        7.24E+5, 63);
INSERT INTO cities VALUES ('Las Vegas', 2.583E+5,
INSERT INTO cities VALUES ('Mariposa', 1200,
INSERT INTO capitals VALUES ('Sacramento',
        3.694E+5, 30, 'CA');
INSERT INTO capitals VALUES ('Madison', 1.913E+5,
        845, 'WI');

To get the data out of the tables, use the select command. It's a very powerful command, so I will demonstrate only some of its characteristics.

-- this will return all records in the table
select * from cities;
select * from capitals;
-- to get also the records of the
-- inherited tables, use this syntax:
select * from cities*;
-- here are some variants to limit the returned
-- data:
select name, altitude
from cities
where altitude > 500;
To change some values in the table, use the update command, which has a similar syntax as the select command:
update cities
-- population grows by 10%
set population = population * 1.1
where name = 'Mariposa';
When you update your data regularly, you will notice that the tables grow continuously, even if you haven't added new tuples. This is not a bug—it's another special feature called time travel. PostgreSQL keeps a history of all data changes in the table. To access this data, you have to use a special qualifier:
select name, population
from cities['epoch', 'now']
where name = 'Mariposa';
This example will list all the values of the two fields, name and population of Mariposa, from the creation of the database up to the present. If you don't wish to retain the history data, you can delete it using the vacuum command. In the future version 7.0, the time-travel feature will vanish, but at this time you need to vacuum your databases regularly. The vacuum command also has the additional purpose of updating the internal data in order to make faster querys possible. Therefore, it is a good idea to define a cron job that runs vacuum every night.