Open-Source Databases, Part II: PostgreSQL
where <username> is a PostgreSQL user with sufficient access privileges on the server. The -U option lets us indicate the user name, and the -l option asks the server to list those databases that are available, such as:
List of databases Name | Owner | Encoding -----------+-----------+----------- testserver | reuven | SQL_ASCII postgres | postgres | SQL_ASCII template0 | postgres | SQL_ASCII template1 | postgres | SQL_ASCII (4 rows)
Like many database systems, PostgreSQL maintains its own list of users and groups separately from the underlying operating system. To create a new user, we use the createuser program, which comes with PostgreSQL:
createuser -U postgres reuven
This invokes createuser as postgres (which has permission to create other users) and then creates a new user named reuven. If we make this new user a superuser, reuven also will be allowed to create new databases and roles.
Now, we can create a new database in this cluster:
createdb -U reuven linux
Double-check that the database exists with a reuse of psql -l:
psql -U reuven -l
You might have noticed that we have not given a password any time we have invoked a command that required a user. The default setting for PostgreSQL makes the server available via a local socket (and thus unavailable over the network). Because only local users will be allowed access, we allow connections from any defined user name, even without a password.
We can change this behavior, as well as other security- and connection-related behaviors, in the pg_hba.conf file located at the top of the cluster directory. The file contains extensive documentation and explains how to set up the connection parameters.
Let's connect to our database and see what happens:
$ psql -U reuven linux
Sure enough, we get the psql prompt, inviting us to enter a query. A list of psql comments is available by typing \? at the prompt. We also can get help with SQL syntax and commands with \h, as in:
\h CREATE TABLE
Sure enough, let's start by creating a table:
CREATE TABLE People ( id SERIAL NOT NULL, first_name TEXT NOT NULL, last_name TEXT NOT NULL, email_address TEXT NOT NULL, added_at TIMESTAMP NOT NULL DEFAULT NOW(), PRIMARY KEY(id), UNIQUE(email_address) );
This table has five columns and three different data types. The most common type is TEXT, which is the typical way to store textual data. PostgreSQL has full support for Unicode, and TEXT columns may contain very long strings. (Until version 8.0, PostgreSQL could store a limited amount of data in each row, so this might come as news to some people.) There is no built-in support for indexing TEXT columns, but an add-on module, called tsearch2, is relatively easy to install and provides such functionality for anyone who needs it.
The added_at column is of type TIMESTAMP. PostgreSQL provides a very sophisticated set of time- and date-handling routines, thanks to the division between two basic data types, known as TIMESTAMP and INTERVAL. A timestamp indicates when something happened, whereas an interval tells you how long something took. We thus can subtract two timestamps (to get an interval) or add an interval to a timestamp (to get a new timestamp). Moreover, PostgreSQL lets us define intervals using English-like syntax, such as:
SELECT id FROM People WHERE added_at > NOW () ↪- interval '30 days';
The above query shows us all of the people who were added to our contact database in the last 30 days.
The added_at column defines a DEFAULT value as well. This means that if we fail to INSERT an explicit value for added_at, PostgreSQL will use the current time (at the time of insertion).
The id column, which we define to be our primary key, uses a SERIAL data type. The thing is, SERIAL isn't a data type at all. Rather, it's syntactic sugar that does several things:
It creates a new sequence object, whose values are integers that start at 1 and increase each time we ask for a value.
It defines the column type to be INTEGER.
It sets the DEFAULT clause for our column to be the result of requesting a new value from the sequence.
This might sound like a complicated way of saying, “SERIAL gives us an auto-incrementing column”. And that's true, in a sense, but you can have as many SERIAL columns as you like in a table, and each sequence can have all sorts of properties associated with it, including its starting point and increment.
Finally, by defining id to be a primary key and email_address to be unique, we implicitly ask PostgreSQL to create indexes on these two columns. When we execute the above query, PostgreSQL notifies us what it's doing behind the scenes:
NOTICE: CREATE TABLE will create implicit sequence "people_id_seq" for serial column "people.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "people_pkey" for table "people" NOTICE: CREATE TABLE / UNIQUE will create implicit index "people_email_address_key" for table "people"
If and when you drop the People table, these implicitly defined objects are dropped automatically.
To list all of the tables, sequences and views in our database, we can use the \d command:
linux=# \d List of relations Schema | Name | Type | Owner --------+---------------+----------+-------- public | people | table | reuven public | people_id_seq | sequence | reuven (2 rows)
We can add a letter to \d to get a list of only tables (t), indexes (i), sequences (s), functions (f) or views (v). For example, here is a list of the indexes that we have created:
linux=# \di List of relations Schema | Name | Type | Owner | Table -------+--------------------------+-------+--------+------- public | people_email_address_key | index | reuven | people public | people_pkey | index | reuven | people (2 rows)
We can also use \d to inspect a particular object more closely. For example, we can look at our People table with \d People:
linux=# \d People Table "public.people" Column | Type | Modifiers --------------+-----------------------------+---------- id | integer | not null default nextval('people_id_seq'::regclass) first_name | text | not null last_name | text | not null email_address | text | not null added_at | timestamp without time zone | not null default now() Indexes: "people_pkey" PRIMARY KEY, btree (id) "people_email_address_key" UNIQUE, btree (email_address)
There are several things to notice in the above output:
First, PostgreSQL sees the table as public.people, not just people. This is because every object must exist inside of a schema, or namespace, and the default schema is called public. We can use schemas to partition the namespace within a particular database or to handle partitions. This means we don't need to split data across two more databases just to deal with conflicting permissions and names.
The table name, as well as all column names, are displayed in lowercase letters. That's because PostgreSQL tries to adhere to the SQL standard as best as possible, and the standard says that identifiers should be case-insensitive. If you really want case-sensitive names (and you probably don't), use double quotes around the identifiers.
Our id column has been transformed, as expected, into an integer column with a default value taken from a sequence.
Editorial Advisory Panel
Thank you to our 2014 Editorial Advisors!
- Jeff Parent
- Brad Baillio
- Nick Baronian
- Steve Case
- Chadalavada Kalyana
- Caleb Cullen
- Keir Davis
- Michael Eager
- Nick Faltys
- Dennis Frey
- Philip Jacob
- Jay Kruizenga
- Steve Marquez
- Dave McAllister
- Craig Oda
- Mike Roberts
- Chris Stark
- Patrick Swartz
- David Lynch
- Alicia Gibb
- Thomas Quinlan
- Carson McDonald
- Kristen Shoemaker
- Charnell Luchich
- James Walker
- Victor Gregorio
- Hari Boukis
- Brian Conner
- David Lane