Open-Source Databases, Part II: PostgreSQL
There are some problems with our table definition. Although we have effectively stopped people from storing NULL values in our TEXT columns, we haven't done anything to stop them from entering empty strings. In addition, we might want to ensure that the email_address column looks at least something like an e-mail address.
We can do this by adding constraints to our columns—tiny functions that check the value being inserted or updated. If the new value doesn't fit the definition, PostgreSQL refuses to allow its insertion. Here's a new definition of our table, with some constraints defined:
CREATE TABLE People ( id SERIAL NOT NULL, first_name TEXT NOT NULL CHECK (first_name <> ''), last_name TEXT NOT NULL CHECK (last_name <> ''), email_address TEXT NOT NULL CHECK (email_address ~* '.@.+\\\.'), added_at TIMESTAMP NOT NULL DEFAULT NOW(), PRIMARY KEY(id), UNIQUE(email_address) );
If we inspect our table definition, it has changed somewhat, to include the constraints:
linux=# \d people Table "public.people" Column | Type | Modifiers --------------+-----------------------------+---------------------- id | integer | not null default nextval('people_id_seq'::regclass) first_name | text | not null last_name | text | not null email_address | text | not null added_at | timestamp without time zone | not null default now() Indexes: "people_pkey" PRIMARY KEY, btree (id) "people_email_address_key" UNIQUE, btree (email_address) Check constraints: "people_email_address_check" CHECK (email_address ~* '.@.+\\.'::text) "people_first_name_check" CHECK (first_name <> ''::text) "people_last_name_check" CHECK (last_name <> ''::text)
Let's see what happens if we violate these constraints:
linux=# insert into people (first_name , last_name, email_address) values ('', 'Lerner', 'firstname.lastname@example.org'); ERROR: new row for relation "people" violates check constraint "people_first_name_check" linux=# insert into people (first_name , last_name, email_address) values ('Reuven2', 'Lerner2', 'reuven'); ERROR: new row for relation "people" violates check constraint "people_email_address_check"
Sure enough, our constraints help to ensure that our database is in order.
The most common type of constraint is a foreign key, in which one table points to another. For example:
CREATE TABLE Appointments ( person_id INTEGER NOT NULL REFERENCES People, starting_time TIMESTAMP NOT NULL, duration INTERVAL NOT NULL, notes TEXT NULL );
If we try to create an appointment that refers to a non-existent person, we will be rejected:
INSERT INTO Appointments (person_id, starting_time, duration, notes) VALUES (5000, '2007-Feb-12 13:00', interval '1 hour', 'Lunch'); ERROR: insert or update on table "appointments" violates foreign key constraint "appointments_person_id_fkey" DETAIL: Key (person_id)=(5000) is not present in table "people".
Foreign-key constraints help in the other direction as well. If you try to drop a row to which a foreign key points, PostgreSQL will refuse the request, indicating that you must first delete the foreign key. You can adjust the rules for these constraints by setting the ON UPDATE or ON DELETE modifiers to the foreign key definition.
This list of features is just the tip of the iceberg. And that's part of the magic of PostgreSQL—out of the box, it's straightforward and easy to use, but you almost always can redefine and extend existing functionality with your own code and data. The built-in operators, along with the flexible ways in which they can be combined and further enhanced with your own functions and definitions, make for a powerful combination. I don't often use unions or intersections, but I do often use views.
For example, one of my favorite features is the ability to use subselects just about anywhere you would have a value. If you have someone's e-mail address, you can use that to INSERT a row into Appointments in a single command:
INSERT INTO Appointments (person_id, starting_time, duration, notes) VALUES ((SELECT id FROM People WHERE email_address = 'email@example.com'), '2007-Feb-12 13:00', interval '1 hour', 'Lunch');
If the existing data types aren't enough, we can construct our own. PostgreSQL already comes with a number of existing data types, including geometric shapes, IP addresses and even ISBNs.
If we want to create more than one table with similar characteristics, we can take advantage of PostgreSQL's object-oriented features. Thus, we could have a People table and a Managers table, in which the definition of Manager inherits the characteristics of People and adds its own extensions.
You also can create your own server-side functions, in a variety of different languages—from PostgreSQL's own Pl/pgsql to specialized versions of Perl, Python, Tcl, Java, Ruby and the R statistical language. These functions can return individual values or entire tables, and can be used in triggers. You also can use these functions to rewrite the rules for inserting, updating and deleting data from a table or even a view.
But, perhaps the most important feature of all is the built-in support for transactions. Transactions are an essential part of database programming, in that they allow us to combine multiple queries into one all-or-nothing action. The classic example of a transaction is the movement of money from one bank account to another; if the power goes out, you want to be sure that the money was moved, or that it wasn't. It would be unacceptable for the money to disappear altogether or for it to appear in both accounts when the lights come back on.
Recent versions of PostgreSQL have enhanced its transactional capabilities. Not only can you commit or roll back a transaction, but you also can define savepoints inside a transaction. If something goes wrong, you can either roll back the entire transaction or merely go to the previous savepoint. Moreover, PostgreSQL now supports two-phase commits, making it possible to synchronize distributed processes that require communication and coordination.
If anything goes wrong, PostgreSQL also provides a PITR (point-in-time recovery) through a write-ahead log (WAL), ensuring that even if the power is cut off at the most critical moment, transactions will be committed or rolled back, and that as many transactions as possible will be committed.
You might have noticed that I haven't mentioned locking at all. That's because, for the most part, PostgreSQL users don't have to worry about locking. The lack of locking is handled using a system known as MVCC (multiversion concurrency control), which has only one drawback, namely the creation of many unused and cast-off database rows. The traditional way to handle this in PostgreSQL is to VACUUM the database regularly, removing old rows and clearing up space. Recent versions now include an auto-vacuum agent, reducing or even eliminating the need to VACUUM on a regular basis.
Finally, recent versions of PostgreSQL include support for tablespaces. This means you can spread tables across different directories and filesystems, rather than keep everything under the directory defined by your installation. This can boost performance or reliability significantly, particularly on large databases.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Stunnel Security for Oracle
- SourceClear Open
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Managing Linux Using Puppet
- Non-Linux FOSS: Caffeine!
- Google's SwiftShader Released
- Tech Tip: Really Simple HTTP Server with Python
- Parsing an RSS News Feed with a Bash Script
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide