Databases for Free
Anywhere data needs to be stored and organized, a database is needed. Sid shows what's available.
by Sid Wentworth
In addition to needing an operating system, some embedded systems will need a structure for organizing data. In this article I discuss various ways of storing and retrieving data. Where appropriate, I include references to free packages that may be what you need.
UNIX, on which Linux is based, has a long history of supporting relatively fast, highly reliable filesystems. While Linux filesystems are not directly based on the UNIX ones, Linux does offer reliable storage options.
The default Linux filesystem, called ext2, has been in service for about eight years. While there have been some minor enhancements during that time, the basic structure remains the same and reliability has never been questioned.
The one disadvantage of the ext2 filesystem is that after a system crash, it is necessary to run a check program to repair any possible damage. This increases system startup (boot) time and, if there is severe damage, could require manual input. This problem is not unique to Linux file systems but is something you should be aware of.
An alternative type of filesystem is called a journaling filesystem. In this filesystem, a transaction log is maintained so that the filesystem itself can perform recovery rather than requiring an external program. Such a filesystem was added to UNIX system V about ten years ago. The Reiser filesystem, a type of journaling filesystem, recently became available for Linux. It is, however, relatively new, so we don't have a good history of reliability yet.
Now, on to specific approaches for data storage and retrieval. All these methods will run on top of whichever Linux filesystem you select.
The most basic way to store data is to save it in regular files. These files are commonly called flat files because there is no indexing information available. You must read the file sequentially to search for a particular datum.
UNIX systems have included what are called DBM routines. While not precisely a database, DBM provides a handy method for storage and retrieval of data using access keys.
Just to make things a little confusing, you will be able to find libraries called dbm, ndbm, gdbm and db. Rather than offer an exhaustive list of options, I address one specific alternative: the Berkeley DB. Originally from the BSD version of UNIX, Berkeley DB is available as a commercial product from Sleepycat Software.
Before you reach for your wallet, let me explain what commercial means here. Sleepycat makes Berkeley DB available at no charge, including source code, if you use the software internally only or if you make your marketed product's source code freely available. If you find this licensing scheme unpalatable, look into ndbm. It may be a better choice for you.
Back to Berkeley DB. What it is not is a relational database package. It is, in fact, a C-callable set of functions that allow you to store and retrieve key/value pairs. These key/value pairs can be pretty much anything you want (that is, any data type and any length up to 2<+>32<+> bytes), which gives you the flexibility to do what you want. Additional features include the ability to have multiple databases; keyed and sequential access to records; memory-mapped, read-only databases; and record locking.
You can download the whole package from the Sleepycat web site, http://www.sleepycat.com/. What you will find included are three packages, the data store, concurrent data store and transactional data store.
As we move through this list, we get more features, but that means more code space and more overhead. The basic data store only offers support for one writer and multiple readers. The concurrent data store adds support for multiple writers, while the transactional data store adds locking options and the code necessary to handle hot backups and disaster recovery. Again, the right choice depends on your application.
While the native language of Berkeley DB is C, interface libraries have been built for other programming languages including Perl, Python and Tcl.
If you still need more capabilities, open-source relational databases are available. Relational databases allow you to store data in uniform structures called tables. Each table has rows and columns. Each row represents one data set or record, and each column represents a particular field within the record. For example, a table of your friends might have their name, phone number and date of birth. The table might look like the one shown in Table 1.
The power of relational databases comes when you create multiple tables and are then able to associate them using a common column in each table. This operation is called a join. For example, let's say you had a second table that contained your friends' names and their e-mail addresses (see Table 2).
By performing a join operation on these two tables and selecting the Name, Phone and E-mail columns from the result, you could produce a list containing these three columns. While this is a very basic look at what a relational database is, I expect you get the idea of where this could lead.
All that is the good news. The bad news is that in order to realize the power of a relational database you have to learn a query language. The most popular of these languages is SQL, which stands for structured query language. Originally developed by IBM, ANSI and the International Standards Organization (ISO) published SQL standards in 1986 and 1987. Subsequent standards have been a joint effort of ANSI and ISO.
Besides entering SQL statements yourself to access data, applications program interfaces (APIs) are available for most other programming languages that allow you to access SQL databases.
One important capability of a relational database is called a transaction, which allows you to identify a set of database updates that are to be treated as one event. That is, you can be assured that either all of them will be performed or none of them performed. For example, in an accounting application, you need to guarantee that a debit is performed only if the related credit is also performed.
Unlike Berkeley DB, most relational databases contain separate client and server parts. What this means is that rather than linking the database system code with your program (like Berkeley DB), you only link with the client. The client code then uses a communications protocol such as TCP/IP to communicate with the server.
The advantage of this client/server approach is that you need not run the client and the server on the same computer system. Thus, this approach scales well. The disadvantage is the additional overhead of supporting the communications protocol. Again, the right choice will depend on your application.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- RSS Feeds
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Designing Electronics with Linux
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- A Topic for Discussion - Open Source Feature-Richness?
- Drupal Is a Framework: Why Everyone Needs to Understand This
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
- Kernel Problem
9 hours 3 min ago
- BASH script to log IPs on public web server
13 hours 30 min ago
17 hours 6 min ago
- Reply to comment | Linux Journal
17 hours 38 min ago
- All the articles you talked
20 hours 2 min ago
- All the articles you talked
20 hours 5 min ago
- All the articles you talked
20 hours 6 min ago
1 day 31 min ago
- Keeping track of IP address
1 day 2 hours ago
- Roll your own dynamic dns
1 day 7 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?