PostgreSQL, the NoSQL Database

One of the most interesting trends in the computer world during the past few years has been the rapid growth of NoSQL databases. The term may be accurate, in that NoSQL databases don't use SQL in order to store and retrieve data, but that's about where the commonalities end. NoSQL databases range from key-value stores to columnar databases to document databases to graph databases.

On the face of it, nothing sounds more natural or reasonable than a NoSQL database. The "impedance mismatch" between programming languages and databases, as it often is described, means that we generally must work in two different languages, and in two different paradigms. In our programs, we think and work with objects, which we carefully construct. And then we deconstruct those objects, turning them into two-dimensional tables in our database. The idea that I can manipulate objects in my database in the same way as I can in my program is attractive at many levels.

In some ways, this is the holy grail of databases: we want something that is rock-solid reliable, scalable to the large proportions that modern Web applications require and also convenient to us as programmers. One popular solution is an ORM (object-relational mapper), which allows us to write our programs using objects. The ORM then translates those objects and method calls into the appropriate SQL, which it passes along to the database. ORMs certainly make it more convenient to work with a relational database, at least when it comes to simple queries. And to no small degree, they also improve the readability of our code, in that we can stick with our objects, without having to use a combination of languages and paradigms.

But ORMs have their problems as well, in no small part because they can shield us from the inner workings of our database. NoSQL advocates say that their databases have solved these problems, allowing them to stay within a single language. Actually, this isn't entirely true. MongoDB has its own SQL-like query language, and CouchDB uses JavaScript. But there are adapters that do similar ORM-like translations for many NoSQL databases, allowing developers to stay within a single language and paradigm when developing.

The ultimate question, however, is whether the benefits of NoSQL databases outweigh their issues. I have largely come to the conclusion that, with the exception of key-value stores, the answer is "no"—that a relational database often is going to be a better solution. And by "better", I mean that relational databases are more reliable, and even more scalable, than many of their NoSQL cousins. Sure, you might need to work hard in order to get the scaling to work correctly, but there is no magic solution. In the past few months alone, I've gained several new clients who decided to move from NoSQL solutions to relational databases, and needed help with the architecture, development or optimization.

The thing is, even the most die-hard relational database fan will admit there are times when NoSQL data stores are convenient. With the growth of JSON in Web APIs, it would be nice to be able to store the result sets in a storage type that understands that format and allows me to search and retrieve from it. And even though key-value stores, such as Redis, are powerful and fast, there are sometimes cases when I'd like to have the key-value pairs connected to data in other relations (tables) in my database.

If this describes your dilemma, I have good news for you. As I write this, PostgreSQL, an amazing database and open-source project, is set to release version 9.4. This new version, like all other PostgreSQL versions, contains a number of optimizations, improvements and usability features. But two of the most intriguing features to me are HStore and JSONB, features that actually turn PostgreSQL into a NoSQL database.

Fine, perhaps I'm exaggerating a bit here. PostgreSQL was and always will be relational and transactional, and adding these new data types hasn't changed that. But having a key-value store within PostgreSQL opens many new possibilities for developers. JSONB, a binary version of JSON storage that supports indexing and a large number of operators, turns PostgreSQL into a document database, albeit one with a few other features in it besides.

In this article, I introduce these NoSQL features that are included in PostgreSQL 9.4, which likely will be released before this issue of Linux Journal gets to you. Although not every application needs these features, they can be useful—and with this latest release of PostgreSQL, the performance also is significantly improved.

HStore

One of the most interesting new developments in PostgreSQL is that of HStore, which provides a key-value store within the PostgreSQL environment. Contrary to what I originally thought, this doesn't mean that PostgreSQL treats a particular table as a key-value store. Rather, HStore is a data type, akin to INTEGER, TEXT and XML. Thus, any column—or set of columns—within a table may be defined to be of type HSTORE. For example:


CREATE TABLE People (
    id   SERIAL,
    info HSTORE,
    PRIMARY KEY(id)
);

Once I have done that, I can ask PostgreSQL to show me the definition of the table:


\d people
              Table "public.people"

-----------------------------------------------------------------
| Column | Type    | Modifiers                                  |
-----------------------------------------------------------------
| id     | integer | not null default                           |
|        |         |  ↪nextval('people_id_seq'::regclass)|
-----------------------------------------------------------------
| info   | hstore  |                                            |
-----------------------------------------------------------------
 Indexes:
        "people_pkey" PRIMARY KEY, btree (id)

As you can see, the type of my "info" column is hstore. What I have effectively created is a (database) table of hash tables. Each row in the "people" table will have its own hash table, with any keys and values. It's typical in such a situation for every row to have the same key names, or at least some minimum number of overlapping key names, but you can, of course, use any keys and values you like.

Both the keys and the values in an HStore column are text strings. You can assign a hash table to an HStore column with the following syntax:


INSERT INTO people(info) VALUES ('foo=>1, bar=>abc, baz=>stuff');

Notice that although this example inserts three key-value pairs into the HStore column, they are stored together, converted automatically into an HStore, splitting the pairs where there is a comma, and each pair where there is a => sign.

So far, you won't see any difference between an HStore and a TEXT column, other than (perhaps) the fact that you cannot use text functions and operators on that column. For example, you cannot use the || operator, which normally concatenates text strings, on the HStore:


UPDATE People SET info = info || 'abc';
ERROR:  XX000: Unexpected end of string
LINE 1: UPDATE People SET info = info || 'abc';
                                             ^

PostgreSQL tries to apply the || operator to the HStore on the left, but cannot find a key-value pair in the string on the right, producing an error message. However, you can add a pair, which will work:


UPDATE People SET info = info || 'abc=>def';

As with all hash tables, HStore is designed for you to use the keys to retrieve the values. That is, each key exists only once in each HStore value, although values may be repeated. The only way to retrieve a value is via the key. You do this with the following syntax:


SELECT info->'bar' FROM People;
----------------
| ?column? |   |    
----------------
| abc      |   |
----------------
(1 row)

Notice several things here. First, the name of the column remains without any quotes, just as you do when you're retrieving the full contents of the column. Second, you put the name of the key after the -> arrow, which is different from the => ("hashrocket") arrow used to delineate key-value pairs within the HStore. Finally, the returned value always will be of type TEXT. This means if you say:


SELECT info->'foo' || 'a' FROM People;
----------------
| ?column? |   |
----------------
| 1a       |   |
----------------
(1 row)

Notice that ||, which works on text values, has done its job here. However, this also means that if you try to multiply your value, you will get an error message:


SELECT info->'foo' * 5 FROM People;
info->'foo' * 5 from people;
                     ^
Time: 5.041 ms

If you want to retrieve info->'foo' as an integer, you must cast that value:


SELECT (info->'foo')::integer * 5 from people;
----------------
| ?column? |   |
----------------
|   5      |   |
----------------
(1 row)

Now, why is HStore so exciting? In particular, if you're a database person who values normalization, you might be wondering why someone even would want this sort of data store, rather than a nicely normalized table or set of tables.

The answer, of course, is that there are many different uses for a database, and some of them can be more appropriate for an HStore. I never would suggest storing serious data in such a thing, but perhaps you want to keep track of user session information, without keeping it inside of a binary object.

Now, HStore is not new to PostgreSQL. The big news in version 9.4 is that GiN and GIST indexes now support HStore columns, and that they do so with great efficiency and speed.

Where do I plan to use HStore? To be honest, I'm not sure yet. I feel like this is a data type that I likely will want to use at some point, but for now, it's simply an extra useful, efficient tool that I can put in my programming toolbox. The fact that it is now extremely efficient, and its operators can take advantage of improved indexes, means that HStore is not only convenient, but speedy, as well.

______________________

Reuven M. Lerner, Linux Journal Senior Columnist, a longtime Web developer, consultant and trainer, is completing his PhD in learning sciences at Northwestern University.