SQL vs. NoSQL
So what about plain-old SQL RDBMSes? Should they be retired from active service? Are they a relic from an earlier time? Not so fast.
First and foremost, ACID transactions most definitely are required in certain use cases. Databases used by banks and stock markets, for example, always must give correct data. Where money is concerned, guessing is not allowed. It is true that no one really cares if your latest tweet takes a couple minutes to show up in your Twitter feed, but the same cannot be said for a billing system or accounting database.
Another thing in favor of RDBMSes is their use of SQL. It's a common language, and if you need to move from one database to another, you usually can get away with making only minor changes to your application, and it will “just work”. True, it may not be possible in all cases, depending on how you used or abused the SQL queries in your application, but the foundation for moving easily between different SQL databases is there, and the tools and libraries you can use to interact with your data are plentiful and robust. A unified NoSQL standard query language or API will never exist because every NoSQL database is so different.
On the NoSQL side, the only thing in common is that there is nothing in common. Each NoSQL database has its own set of APIs, libraries and preferred languages for interacting with the data they contain. With an RDBMS, it is trivial to get data out in whatever format you need using whatever programming language you like best. Your choice of a NoSQL database might limit you to one or a handful of programming languages and access methods.
Another thing RDBMSes have going for them is the relational model. The R in RDBMS traces its history back to research by E. F. Codd published in the June 1970 issue of Communications of the ACM. Since then, it has been expanded upon, improved and clarified. The relational model for databases is so popular because it is an excellent way to organize information. It maps very well to an enormous variety of real-world data storage needs, and when properly normalized, it is fast and efficient.
In the relational model, data is stored in tables with rows and columns. An address table, for example, might have columns for street name and number, city, postal code, state or province, and country. A name table might have columns for given names, family name, prefixes (Dr, Rev, Ms and so on) and suffixes (Jr, Sr, Esq and so on). Each row in the individual tables would represent an individual address or name.
The relational part (see the What Does Relational Mean in a Relational Database? sidebar) comes into play as you define which addresses relate to which names using a key. A key is a field (the intersection of a row and column) or combination of fields in a single row that is guaranteed to identify uniquely that particular row in the table it is in. For the address table, you might have a column for keys from the name table. You can use this key to look up just those addresses in the address table that “belong” (by virtue of the key) to a certain name in the name table.
My example is pretty simplistic, but when combined with ACID transactions in an RDBMS, you achieve tremendous power, flexibility and reliability. There is a reason that businesses began using them decades ago and why open-source RDBMSes dominate the Web.
And, what about the Web? The primary argument many people use against RDBMSes is that they “don't scale”, which simply isn't true. It is true that some individual RDBMSes do not scale very well or are harder to scale, but that doesn't mean every RDBMS cannot. RDBMSes are in use at every large company. The largest RDBMS installations routinely handle enormous traffic and petabytes of data.
This scaling myth is perpetuated and given credence every time popular Web sites announce that such-and-such RDBMS doesn't meet their needs, and so they are moving to NoSQL database X. The opinion of some in the RDBMS world is that many of these moves are not so much because the database they were using is deficient in some fundamental way, but because it was being used in a way for which it wasn't designed. To make an analogy, it's like people using flat-head screwdrivers to tighten Phillips-head screws, because it worked well enough to get the job done, but now they've discovered it is better to tighten Phillips screws with an actual Phillips screwdriver, and isn't it wonderful, and we should throw away all flat-head screwdrivers, because their time is past, and Phillips is the future.
One recent SQL-to-NoSQL move involved Digg.com moving from MySQL to Cassandra. As part of the move, Digg folks blogged about how they were using MySQL and why it didn't meet their needs. Others were skeptical. Dennis Forbes, in a series of posts on his site (see Resources), questioned whether Digg needed to use a NoSQL solution like Cassandra at all. His claims centered on what he considered very poor database usage on the part of Digg combined with inadequate hardware. In his mind, if Digg had just designed its database properly or switched to using SSDs in its servers, it would have had no problems. His best quote is this: “The way that many are using NoSQL is like discovering the buggy whip at the beginning of the automotive era.” Ouch.
Relational databases sometimes can be tricky to design properly. You have to know and understand your data deeply. But when they are designed properly, the performance can be orders of magnitude better compared to poorly designed databases. You also should not overlook the hardware on which your database runs. Databases love as much memory and processing power as you can throw at them, and the traditional spinning-platter disk drive has long been a limiting factor. Does the high performance of SSDs herald a new age of RDBMS performance? Many experts say yes. SSDs may be a game-changer in the database world.
Relational SQL databases have been around for several decades. They have proven reliability and performance and a feature set that meets the requirements of 99% of the use cases out there. They even make excellent key-value databases, if that's the type of data you have. There are only very few companies that can't make a relational database work for them. You may not like to hear it, but with the law of averages, chances are your company is not one of them.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Google's SwiftShader Released
- Interview with Patrick Volkerding
- My +1 Sword of Productivity
- SUSE LLC's SUSE Manager
- Tech Tip: Really Simple HTTP Server with Python
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- SuperTuxKart 0.9.2 Released
- Returning Values from Bash Functions
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide