Open-Source Databases, Part II: PostgreSQL
There are some problems with our table definition. Although we have effectively stopped people from storing NULL values in our TEXT columns, we haven't done anything to stop them from entering empty strings. In addition, we might want to ensure that the email_address column looks at least something like an e-mail address.
We can do this by adding constraints to our columns—tiny functions that check the value being inserted or updated. If the new value doesn't fit the definition, PostgreSQL refuses to allow its insertion. Here's a new definition of our table, with some constraints defined:
CREATE TABLE People ( id SERIAL NOT NULL, first_name TEXT NOT NULL CHECK (first_name <> ''), last_name TEXT NOT NULL CHECK (last_name <> ''), email_address TEXT NOT NULL CHECK (email_address ~* '.@.+\\\.'), added_at TIMESTAMP NOT NULL DEFAULT NOW(), PRIMARY KEY(id), UNIQUE(email_address) );
If we inspect our table definition, it has changed somewhat, to include the constraints:
linux=# \d people Table "public.people" Column | Type | Modifiers --------------+-----------------------------+---------------------- id | integer | not null default nextval('people_id_seq'::regclass) first_name | text | not null last_name | text | not null email_address | text | not null added_at | timestamp without time zone | not null default now() Indexes: "people_pkey" PRIMARY KEY, btree (id) "people_email_address_key" UNIQUE, btree (email_address) Check constraints: "people_email_address_check" CHECK (email_address ~* '.@.+\\.'::text) "people_first_name_check" CHECK (first_name <> ''::text) "people_last_name_check" CHECK (last_name <> ''::text)
Let's see what happens if we violate these constraints:
linux=# insert into people (first_name , last_name, email_address) values ('', 'Lerner', 'email@example.com'); ERROR: new row for relation "people" violates check constraint "people_first_name_check" linux=# insert into people (first_name , last_name, email_address) values ('Reuven2', 'Lerner2', 'reuven'); ERROR: new row for relation "people" violates check constraint "people_email_address_check"
Sure enough, our constraints help to ensure that our database is in order.
The most common type of constraint is a foreign key, in which one table points to another. For example:
CREATE TABLE Appointments ( person_id INTEGER NOT NULL REFERENCES People, starting_time TIMESTAMP NOT NULL, duration INTERVAL NOT NULL, notes TEXT NULL );
If we try to create an appointment that refers to a non-existent person, we will be rejected:
INSERT INTO Appointments (person_id, starting_time, duration, notes) VALUES (5000, '2007-Feb-12 13:00', interval '1 hour', 'Lunch'); ERROR: insert or update on table "appointments" violates foreign key constraint "appointments_person_id_fkey" DETAIL: Key (person_id)=(5000) is not present in table "people".
Foreign-key constraints help in the other direction as well. If you try to drop a row to which a foreign key points, PostgreSQL will refuse the request, indicating that you must first delete the foreign key. You can adjust the rules for these constraints by setting the ON UPDATE or ON DELETE modifiers to the foreign key definition.
This list of features is just the tip of the iceberg. And that's part of the magic of PostgreSQL—out of the box, it's straightforward and easy to use, but you almost always can redefine and extend existing functionality with your own code and data. The built-in operators, along with the flexible ways in which they can be combined and further enhanced with your own functions and definitions, make for a powerful combination. I don't often use unions or intersections, but I do often use views.
For example, one of my favorite features is the ability to use subselects just about anywhere you would have a value. If you have someone's e-mail address, you can use that to INSERT a row into Appointments in a single command:
INSERT INTO Appointments (person_id, starting_time, duration, notes) VALUES ((SELECT id FROM People WHERE email_address = 'firstname.lastname@example.org'), '2007-Feb-12 13:00', interval '1 hour', 'Lunch');
If the existing data types aren't enough, we can construct our own. PostgreSQL already comes with a number of existing data types, including geometric shapes, IP addresses and even ISBNs.
If we want to create more than one table with similar characteristics, we can take advantage of PostgreSQL's object-oriented features. Thus, we could have a People table and a Managers table, in which the definition of Manager inherits the characteristics of People and adds its own extensions.
You also can create your own server-side functions, in a variety of different languages—from PostgreSQL's own Pl/pgsql to specialized versions of Perl, Python, Tcl, Java, Ruby and the R statistical language. These functions can return individual values or entire tables, and can be used in triggers. You also can use these functions to rewrite the rules for inserting, updating and deleting data from a table or even a view.
But, perhaps the most important feature of all is the built-in support for transactions. Transactions are an essential part of database programming, in that they allow us to combine multiple queries into one all-or-nothing action. The classic example of a transaction is the movement of money from one bank account to another; if the power goes out, you want to be sure that the money was moved, or that it wasn't. It would be unacceptable for the money to disappear altogether or for it to appear in both accounts when the lights come back on.
Recent versions of PostgreSQL have enhanced its transactional capabilities. Not only can you commit or roll back a transaction, but you also can define savepoints inside a transaction. If something goes wrong, you can either roll back the entire transaction or merely go to the previous savepoint. Moreover, PostgreSQL now supports two-phase commits, making it possible to synchronize distributed processes that require communication and coordination.
If anything goes wrong, PostgreSQL also provides a PITR (point-in-time recovery) through a write-ahead log (WAL), ensuring that even if the power is cut off at the most critical moment, transactions will be committed or rolled back, and that as many transactions as possible will be committed.
You might have noticed that I haven't mentioned locking at all. That's because, for the most part, PostgreSQL users don't have to worry about locking. The lack of locking is handled using a system known as MVCC (multiversion concurrency control), which has only one drawback, namely the creation of many unused and cast-off database rows. The traditional way to handle this in PostgreSQL is to VACUUM the database regularly, removing old rows and clearing up space. Recent versions now include an auto-vacuum agent, reducing or even eliminating the need to VACUUM on a regular basis.
Finally, recent versions of PostgreSQL include support for tablespaces. This means you can spread tables across different directories and filesystems, rather than keep everything under the directory defined by your installation. This can boost performance or reliability significantly, particularly on large databases.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Linux Systems Administrator
- New Products
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Have you tried Boxen? It's a
4 hours 27 min ago
- seo services in india
8 hours 59 min ago
- For KDE install kio-mtp
8 hours 59 min ago
- Evernote is much more...
11 hours 2 sec ago
- Reply to comment | Linux Journal
19 hours 45 min ago
- Dynamic DNS
20 hours 19 min ago
- Reply to comment | Linux Journal
21 hours 18 min ago
- Reply to comment | Linux Journal
22 hours 8 min ago
- Not free anymore
1 day 2 hours ago
1 day 5 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?