Databases for Free
Anywhere data needs to be stored and organized, a database is needed. Sid shows what's available.
by Sid Wentworth
In addition to needing an operating system, some embedded systems will need a structure for organizing data. In this article I discuss various ways of storing and retrieving data. Where appropriate, I include references to free packages that may be what you need.
UNIX, on which Linux is based, has a long history of supporting relatively fast, highly reliable filesystems. While Linux filesystems are not directly based on the UNIX ones, Linux does offer reliable storage options.
The default Linux filesystem, called ext2, has been in service for about eight years. While there have been some minor enhancements during that time, the basic structure remains the same and reliability has never been questioned.
The one disadvantage of the ext2 filesystem is that after a system crash, it is necessary to run a check program to repair any possible damage. This increases system startup (boot) time and, if there is severe damage, could require manual input. This problem is not unique to Linux file systems but is something you should be aware of.
An alternative type of filesystem is called a journaling filesystem. In this filesystem, a transaction log is maintained so that the filesystem itself can perform recovery rather than requiring an external program. Such a filesystem was added to UNIX system V about ten years ago. The Reiser filesystem, a type of journaling filesystem, recently became available for Linux. It is, however, relatively new, so we don't have a good history of reliability yet.
Now, on to specific approaches for data storage and retrieval. All these methods will run on top of whichever Linux filesystem you select.
The most basic way to store data is to save it in regular files. These files are commonly called flat files because there is no indexing information available. You must read the file sequentially to search for a particular datum.
UNIX systems have included what are called DBM routines. While not precisely a database, DBM provides a handy method for storage and retrieval of data using access keys.
Just to make things a little confusing, you will be able to find libraries called dbm, ndbm, gdbm and db. Rather than offer an exhaustive list of options, I address one specific alternative: the Berkeley DB. Originally from the BSD version of UNIX, Berkeley DB is available as a commercial product from Sleepycat Software.
Before you reach for your wallet, let me explain what commercial means here. Sleepycat makes Berkeley DB available at no charge, including source code, if you use the software internally only or if you make your marketed product's source code freely available. If you find this licensing scheme unpalatable, look into ndbm. It may be a better choice for you.
Back to Berkeley DB. What it is not is a relational database package. It is, in fact, a C-callable set of functions that allow you to store and retrieve key/value pairs. These key/value pairs can be pretty much anything you want (that is, any data type and any length up to 2<+>32<+> bytes), which gives you the flexibility to do what you want. Additional features include the ability to have multiple databases; keyed and sequential access to records; memory-mapped, read-only databases; and record locking.
You can download the whole package from the Sleepycat web site, http://www.sleepycat.com/. What you will find included are three packages, the data store, concurrent data store and transactional data store.
As we move through this list, we get more features, but that means more code space and more overhead. The basic data store only offers support for one writer and multiple readers. The concurrent data store adds support for multiple writers, while the transactional data store adds locking options and the code necessary to handle hot backups and disaster recovery. Again, the right choice depends on your application.
While the native language of Berkeley DB is C, interface libraries have been built for other programming languages including Perl, Python and Tcl.
If you still need more capabilities, open-source relational databases are available. Relational databases allow you to store data in uniform structures called tables. Each table has rows and columns. Each row represents one data set or record, and each column represents a particular field within the record. For example, a table of your friends might have their name, phone number and date of birth. The table might look like the one shown in Table 1.
The power of relational databases comes when you create multiple tables and are then able to associate them using a common column in each table. This operation is called a join. For example, let's say you had a second table that contained your friends' names and their e-mail addresses (see Table 2).
By performing a join operation on these two tables and selecting the Name, Phone and E-mail columns from the result, you could produce a list containing these three columns. While this is a very basic look at what a relational database is, I expect you get the idea of where this could lead.
All that is the good news. The bad news is that in order to realize the power of a relational database you have to learn a query language. The most popular of these languages is SQL, which stands for structured query language. Originally developed by IBM, ANSI and the International Standards Organization (ISO) published SQL standards in 1986 and 1987. Subsequent standards have been a joint effort of ANSI and ISO.
Besides entering SQL statements yourself to access data, applications program interfaces (APIs) are available for most other programming languages that allow you to access SQL databases.
One important capability of a relational database is called a transaction, which allows you to identify a set of database updates that are to be treated as one event. That is, you can be assured that either all of them will be performed or none of them performed. For example, in an accounting application, you need to guarantee that a debit is performed only if the related credit is also performed.
Unlike Berkeley DB, most relational databases contain separate client and server parts. What this means is that rather than linking the database system code with your program (like Berkeley DB), you only link with the client. The client code then uses a communications protocol such as TCP/IP to communicate with the server.
The advantage of this client/server approach is that you need not run the client and the server on the same computer system. Thus, this approach scales well. The disadvantage is the additional overhead of supporting the communications protocol. Again, the right choice will depend on your application.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Back to Backups
- A New Version of Rust Hits the Streets
- Google's Abacus Project: It's All about Trust
- Secure Desktops with Qubes: Introduction
- Seeing Red and Getting Sleep
- Fancy Tricks for Changing Numeric Base
- Secure Desktops with Qubes: Installation
- Working with Command Arguments
- CentOS 6.8 Released
Until recently, IBM’s Power Platform was looked upon as being the system that hosted IBM’s flavor of UNIX and proprietary operating system called IBM i. These servers often are found in medium-size businesses running ERP, CRM and financials for on-premise customers. By enabling the Power platform to run the Linux OS, IBM now has positioned Power to be the platform of choice for those already running Linux that are facing scalability issues, especially customers looking at analytics, big data or cloud computing.
￼Running Linux on IBM’s Power hardware offers some obvious benefits, including improved processing speed and memory bandwidth, inherent security, and simpler deployment and management. But if you look beyond the impressive architecture, you’ll also find an open ecosystem that has given rise to a strong, innovative community, as well as an inventory of system and network management applications that really help leverage the benefits offered by running Linux on Power.Get the Guide