At the Forge - NoSQL? I'd Prefer SomeSQL
When I started to develop Web applications in the early 1990s, I sometimes found myself needing to store persistent data. At the time, the main options available were files—plain-text files, binary files or DBM (now Berkeley DB) files.
Fortunately, the languages I was using for Web development at the time (mainly Perl and Tcl) made it fairly easy to use any of those options. However, Web developers like myself quickly discovered the problems inherent in these technologies, particularly on large sites. The lack of intelligent, fine-grained locking meant that a popular application would be forced to wait before it could read from, or write to, an external file. Moreover, storing the data in this way prevented us from retrieving it intelligently or selectively. You could, of course, read the data into your program and produce your own reports, but each report required not just its own query, but its own program.
So, you can imagine how my life changed dramatically when I was introduced to relational database systems. Simply put, a relational database is built out of many two-dimensional tables, with each table containing one or more columns (the attributes) and one or more rows (the records). You could relate these tables to one another (thus, making the database “relational”) with foreign keys, such that a column in one table would represent a record in another table.
When you store data in a relational database, you suddenly are able to stop thinking in terms of how your data is stored and more in terms of how it is structured. Is the database server keeping information in binary format, in ASCII or in something altogether different? Who knows? And furthermore, who cares? So long as I can store and retrieve my data easily and efficiently, I'll treat the database as a black box, concentrating on writing my application.
SQL, the query language that works with relational databases, encourages this kind of thinking. When you use SQL to query a database, you describe the data you want to retrieve, rather than telling the computer how to retrieve it. True, now there are procedural extensions to SQL, and you also can write programs in non-SQL languages that manipulate the data in further ways. But, even they allow you to loop over, manipulate and perform calculations on the retrieved data. The data itself is still in the database, and you still describe how and what you want to retrieve in the declarative language of SQL.
Relational databases became popular because they offered so many advantages to programmers. They offered stability and security. They offered transactions, allowing programmers to ensure that a set of operations either all happened atomically or that they were rolled back to their state before the query began to execute. It was fairly easy to map many types of data onto two-dimensional tables. Databases gained all sorts of capabilities, including the ability to check the integrity of the data they are storing. And, they even gained the ability to distribute the load across multiple computers, albeit with some degree of difficulty.
Open-source relational databases, such as MySQL and PostgreSQL, began to challenge their commercial competitors. When I began writing this column in 1996, MySQL had not yet been released under an open-source license, and although PostgreSQL was fully open source, it had a number of limitations. So, it was expected that if you were interested in using a relational database, you almost certainly would fork over money for one, paying Oracle or one of its smaller competitors many thousands of dollars. Today, the opposite is the case. Open-source databases are more than adequate for Web applications. Moreover, it is basically expected that a Web application will use a relational database as a back end. If you say you are a Web developer, it is basically expected that you have some experience with SQL, as well as the care and feeling of a relational database.
I've been using relational databases for years, and I'm quite comfortable with them. Indeed, as I've sometimes told my clients, I see the world through two-dimensional tables. Breaking problems down into tables and the relations among them is something I do nearly every day. Yet, even I recognize that a database is a tool, and that not every tool is appropriate for every task. Saying every Web application should use a relational database is like saying all programs should be written in C (or whatever your favorite programming language is at the moment). As the old adage goes, if your only tool is a hammer, every problem looks like a nail.
And, there are indeed some issues with relational databases that have come to the foreground during the past few years. Perhaps the biggest problem many developers face is the “impedance mismatch” between objects, with which we enjoy writing our programs, and tables, in which relational databases function. If you want to store your object in a database, you have several options, none of which are particularly good, and all of which lose some of the semantic value that is the whole point of objects. True, you can treat each object attribute as a column, each instance as a row and each class as a table, but many corner cases don't map well to this, even with an excellent object-relational mapper (ORM) library. One of the reasons I enjoy working with Ruby on Rails is the large degree to which ActiveRecord is able to paper over these differences, but even with all of ActiveRecord's smarts, some problems can't be solved easily with objects, where it's required to massage my data model or even to drop down into SQL.
This problem has become bigger in the past few years, with the growth of documents and objects that contain many different types of data, some of which are unpredictable or contain highly variable content. In response to these needs, database servers have begun to incorporate functions that previously were considered to be outside the database domain, such as full-text search (which is extremely useful) and XML handling (which I still find puzzling and somewhat inappropriate). Regardless of your attitude toward such features, it does seem a bit strange to set up a database server just so you can store and retrieve files. There must be better ways to store such files, without all the administrative and computational overhead associated with a database server.
Another problem has to do with the greater and greater performance requirements of Web applications. I tend to dismiss talk of “scalable” languages and frameworks, in part because I believe that any language or framework can scale, given sufficient hardware. Instead of worrying about the scalability of a particular language, I would concentrate on the scalability of an architecture—and that is certainly a weak point for relational database servers. Modern Web applications using a “share-nothing” approach easily can scale up to hundreds or even thousands of Web servers, each handling part of the load. But if each of those servers needs to communicate with a database server on the back end, the database might become overwhelmed at some point. There are solutions to this problem, even among the open-source databases, but they are far from dead simple to install and configure.
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
|Non-Linux FOSS: Seashore||May 10, 2013|
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- What's the tweeting protocol?
- Tech Tip: Really Simple HTTP Server with Python
- Kernel Problem
26 min 46 sec ago
- BASH script to log IPs on public web server
4 hours 53 min ago
8 hours 29 min ago
- Reply to comment | Linux Journal
9 hours 1 min ago
- All the articles you talked
11 hours 25 min ago
- All the articles you talked
11 hours 28 min ago
- All the articles you talked
11 hours 30 min ago
15 hours 54 min ago
- Keeping track of IP address
17 hours 45 min ago
- Roll your own dynamic dns
22 hours 59 min ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?