SQL vs. NoSQL
So what about plain-old SQL RDBMSes? Should they be retired from active service? Are they a relic from an earlier time? Not so fast.
First and foremost, ACID transactions most definitely are required in certain use cases. Databases used by banks and stock markets, for example, always must give correct data. Where money is concerned, guessing is not allowed. It is true that no one really cares if your latest tweet takes a couple minutes to show up in your Twitter feed, but the same cannot be said for a billing system or accounting database.
Another thing in favor of RDBMSes is their use of SQL. It's a common language, and if you need to move from one database to another, you usually can get away with making only minor changes to your application, and it will “just work”. True, it may not be possible in all cases, depending on how you used or abused the SQL queries in your application, but the foundation for moving easily between different SQL databases is there, and the tools and libraries you can use to interact with your data are plentiful and robust. A unified NoSQL standard query language or API will never exist because every NoSQL database is so different.
On the NoSQL side, the only thing in common is that there is nothing in common. Each NoSQL database has its own set of APIs, libraries and preferred languages for interacting with the data they contain. With an RDBMS, it is trivial to get data out in whatever format you need using whatever programming language you like best. Your choice of a NoSQL database might limit you to one or a handful of programming languages and access methods.
Another thing RDBMSes have going for them is the relational model. The R in RDBMS traces its history back to research by E. F. Codd published in the June 1970 issue of Communications of the ACM. Since then, it has been expanded upon, improved and clarified. The relational model for databases is so popular because it is an excellent way to organize information. It maps very well to an enormous variety of real-world data storage needs, and when properly normalized, it is fast and efficient.
In the relational model, data is stored in tables with rows and columns. An address table, for example, might have columns for street name and number, city, postal code, state or province, and country. A name table might have columns for given names, family name, prefixes (Dr, Rev, Ms and so on) and suffixes (Jr, Sr, Esq and so on). Each row in the individual tables would represent an individual address or name.
The relational part (see the What Does Relational Mean in a Relational Database? sidebar) comes into play as you define which addresses relate to which names using a key. A key is a field (the intersection of a row and column) or combination of fields in a single row that is guaranteed to identify uniquely that particular row in the table it is in. For the address table, you might have a column for keys from the name table. You can use this key to look up just those addresses in the address table that “belong” (by virtue of the key) to a certain name in the name table.
My example is pretty simplistic, but when combined with ACID transactions in an RDBMS, you achieve tremendous power, flexibility and reliability. There is a reason that businesses began using them decades ago and why open-source RDBMSes dominate the Web.
And, what about the Web? The primary argument many people use against RDBMSes is that they “don't scale”, which simply isn't true. It is true that some individual RDBMSes do not scale very well or are harder to scale, but that doesn't mean every RDBMS cannot. RDBMSes are in use at every large company. The largest RDBMS installations routinely handle enormous traffic and petabytes of data.
This scaling myth is perpetuated and given credence every time popular Web sites announce that such-and-such RDBMS doesn't meet their needs, and so they are moving to NoSQL database X. The opinion of some in the RDBMS world is that many of these moves are not so much because the database they were using is deficient in some fundamental way, but because it was being used in a way for which it wasn't designed. To make an analogy, it's like people using flat-head screwdrivers to tighten Phillips-head screws, because it worked well enough to get the job done, but now they've discovered it is better to tighten Phillips screws with an actual Phillips screwdriver, and isn't it wonderful, and we should throw away all flat-head screwdrivers, because their time is past, and Phillips is the future.
One recent SQL-to-NoSQL move involved Digg.com moving from MySQL to Cassandra. As part of the move, Digg folks blogged about how they were using MySQL and why it didn't meet their needs. Others were skeptical. Dennis Forbes, in a series of posts on his site (see Resources), questioned whether Digg needed to use a NoSQL solution like Cassandra at all. His claims centered on what he considered very poor database usage on the part of Digg combined with inadequate hardware. In his mind, if Digg had just designed its database properly or switched to using SSDs in its servers, it would have had no problems. His best quote is this: “The way that many are using NoSQL is like discovering the buggy whip at the beginning of the automotive era.” Ouch.
Relational databases sometimes can be tricky to design properly. You have to know and understand your data deeply. But when they are designed properly, the performance can be orders of magnitude better compared to poorly designed databases. You also should not overlook the hardware on which your database runs. Databases love as much memory and processing power as you can throw at them, and the traditional spinning-platter disk drive has long been a limiting factor. Does the high performance of SSDs herald a new age of RDBMS performance? Many experts say yes. SSDs may be a game-changer in the database world.
Relational SQL databases have been around for several decades. They have proven reliability and performance and a feature set that meets the requirements of 99% of the use cases out there. They even make excellent key-value databases, if that's the type of data you have. There are only very few companies that can't make a relational database work for them. You may not like to hear it, but with the law of averages, chances are your company is not one of them.
|September 2015 Issue of Linux Journal: HOW-TOs||Sep 01, 2015|
|September 2015 Video Preview||Sep 01, 2015|
|Using tshark to Watch and Inspect Network Traffic||Aug 31, 2015|
|Where's That Pesky Hidden Word?||Aug 28, 2015|
|A Project to Guarantee Better Security for Open-Source Projects||Aug 27, 2015|
|Concerning Containers' Connections: on Docker Networking||Aug 26, 2015|
- Optimization in GCC
- Using tshark to Watch and Inspect Network Traffic
- September 2015 Issue of Linux Journal: HOW-TOs
- Problems with Ubuntu's Software Center and How Canonical Plans to Fix Them
- Concerning Containers' Connections: on Docker Networking
- A Project to Guarantee Better Security for Open-Source Projects
- Where's That Pesky Hidden Word?
- Firefox Security Exploit Targets Linux Users and Web Developers
- My Network Go-Bag
- Doing Astronomy with Python