SQL vs. NoSQL
So what about plain-old SQL RDBMSes? Should they be retired from active service? Are they a relic from an earlier time? Not so fast.
First and foremost, ACID transactions most definitely are required in certain use cases. Databases used by banks and stock markets, for example, always must give correct data. Where money is concerned, guessing is not allowed. It is true that no one really cares if your latest tweet takes a couple minutes to show up in your Twitter feed, but the same cannot be said for a billing system or accounting database.
Another thing in favor of RDBMSes is their use of SQL. It's a common language, and if you need to move from one database to another, you usually can get away with making only minor changes to your application, and it will “just work”. True, it may not be possible in all cases, depending on how you used or abused the SQL queries in your application, but the foundation for moving easily between different SQL databases is there, and the tools and libraries you can use to interact with your data are plentiful and robust. A unified NoSQL standard query language or API will never exist because every NoSQL database is so different.
On the NoSQL side, the only thing in common is that there is nothing in common. Each NoSQL database has its own set of APIs, libraries and preferred languages for interacting with the data they contain. With an RDBMS, it is trivial to get data out in whatever format you need using whatever programming language you like best. Your choice of a NoSQL database might limit you to one or a handful of programming languages and access methods.
Another thing RDBMSes have going for them is the relational model. The R in RDBMS traces its history back to research by E. F. Codd published in the June 1970 issue of Communications of the ACM. Since then, it has been expanded upon, improved and clarified. The relational model for databases is so popular because it is an excellent way to organize information. It maps very well to an enormous variety of real-world data storage needs, and when properly normalized, it is fast and efficient.
In the relational model, data is stored in tables with rows and columns. An address table, for example, might have columns for street name and number, city, postal code, state or province, and country. A name table might have columns for given names, family name, prefixes (Dr, Rev, Ms and so on) and suffixes (Jr, Sr, Esq and so on). Each row in the individual tables would represent an individual address or name.
The relational part (see the What Does Relational Mean in a Relational Database? sidebar) comes into play as you define which addresses relate to which names using a key. A key is a field (the intersection of a row and column) or combination of fields in a single row that is guaranteed to identify uniquely that particular row in the table it is in. For the address table, you might have a column for keys from the name table. You can use this key to look up just those addresses in the address table that “belong” (by virtue of the key) to a certain name in the name table.
My example is pretty simplistic, but when combined with ACID transactions in an RDBMS, you achieve tremendous power, flexibility and reliability. There is a reason that businesses began using them decades ago and why open-source RDBMSes dominate the Web.
And, what about the Web? The primary argument many people use against RDBMSes is that they “don't scale”, which simply isn't true. It is true that some individual RDBMSes do not scale very well or are harder to scale, but that doesn't mean every RDBMS cannot. RDBMSes are in use at every large company. The largest RDBMS installations routinely handle enormous traffic and petabytes of data.
This scaling myth is perpetuated and given credence every time popular Web sites announce that such-and-such RDBMS doesn't meet their needs, and so they are moving to NoSQL database X. The opinion of some in the RDBMS world is that many of these moves are not so much because the database they were using is deficient in some fundamental way, but because it was being used in a way for which it wasn't designed. To make an analogy, it's like people using flat-head screwdrivers to tighten Phillips-head screws, because it worked well enough to get the job done, but now they've discovered it is better to tighten Phillips screws with an actual Phillips screwdriver, and isn't it wonderful, and we should throw away all flat-head screwdrivers, because their time is past, and Phillips is the future.
One recent SQL-to-NoSQL move involved Digg.com moving from MySQL to Cassandra. As part of the move, Digg folks blogged about how they were using MySQL and why it didn't meet their needs. Others were skeptical. Dennis Forbes, in a series of posts on his site (see Resources), questioned whether Digg needed to use a NoSQL solution like Cassandra at all. His claims centered on what he considered very poor database usage on the part of Digg combined with inadequate hardware. In his mind, if Digg had just designed its database properly or switched to using SSDs in its servers, it would have had no problems. His best quote is this: “The way that many are using NoSQL is like discovering the buggy whip at the beginning of the automotive era.” Ouch.
Relational databases sometimes can be tricky to design properly. You have to know and understand your data deeply. But when they are designed properly, the performance can be orders of magnitude better compared to poorly designed databases. You also should not overlook the hardware on which your database runs. Databases love as much memory and processing power as you can throw at them, and the traditional spinning-platter disk drive has long been a limiting factor. Does the high performance of SSDs herald a new age of RDBMS performance? Many experts say yes. SSDs may be a game-changer in the database world.
Relational SQL databases have been around for several decades. They have proven reliability and performance and a feature set that meets the requirements of 99% of the use cases out there. They even make excellent key-value databases, if that's the type of data you have. There are only very few companies that can't make a relational database work for them. You may not like to hear it, but with the law of averages, chances are your company is not one of them.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Nice article, thanks for the
7 hours 9 min ago - I once had a better way I
12 hours 55 min ago - Not only you I too assumed
13 hours 12 min ago - another very interesting
15 hours 5 min ago - Reply to comment | Linux Journal
16 hours 59 min ago - Reply to comment | Linux Journal
23 hours 53 min ago - Reply to comment | Linux Journal
1 day 9 min ago - Favorite (and easily brute-forced) pw's
1 day 2 hours ago - Have you tried Boxen? It's a
1 day 7 hours ago - seo services in india
1 day 12 hours ago




Comments
Yap what more can you say
Yes this article exactly matches the situation today with SQL/NoSQL movements.
Very good analysis.