Databases for Free
Anywhere data needs to be stored and organized, a database is needed. Sid shows what's available.
by Sid Wentworth
In addition to needing an operating system, some embedded systems will need a structure for organizing data. In this article I discuss various ways of storing and retrieving data. Where appropriate, I include references to free packages that may be what you need.
UNIX, on which Linux is based, has a long history of supporting relatively fast, highly reliable filesystems. While Linux filesystems are not directly based on the UNIX ones, Linux does offer reliable storage options.
The default Linux filesystem, called ext2, has been in service for about eight years. While there have been some minor enhancements during that time, the basic structure remains the same and reliability has never been questioned.
The one disadvantage of the ext2 filesystem is that after a system crash, it is necessary to run a check program to repair any possible damage. This increases system startup (boot) time and, if there is severe damage, could require manual input. This problem is not unique to Linux file systems but is something you should be aware of.
An alternative type of filesystem is called a journaling filesystem. In this filesystem, a transaction log is maintained so that the filesystem itself can perform recovery rather than requiring an external program. Such a filesystem was added to UNIX system V about ten years ago. The Reiser filesystem, a type of journaling filesystem, recently became available for Linux. It is, however, relatively new, so we don't have a good history of reliability yet.
Now, on to specific approaches for data storage and retrieval. All these methods will run on top of whichever Linux filesystem you select.
The most basic way to store data is to save it in regular files. These files are commonly called flat files because there is no indexing information available. You must read the file sequentially to search for a particular datum.
UNIX systems have included what are called DBM routines. While not precisely a database, DBM provides a handy method for storage and retrieval of data using access keys.
Just to make things a little confusing, you will be able to find libraries called dbm, ndbm, gdbm and db. Rather than offer an exhaustive list of options, I address one specific alternative: the Berkeley DB. Originally from the BSD version of UNIX, Berkeley DB is available as a commercial product from Sleepycat Software.
Before you reach for your wallet, let me explain what commercial means here. Sleepycat makes Berkeley DB available at no charge, including source code, if you use the software internally only or if you make your marketed product's source code freely available. If you find this licensing scheme unpalatable, look into ndbm. It may be a better choice for you.
Back to Berkeley DB. What it is not is a relational database package. It is, in fact, a C-callable set of functions that allow you to store and retrieve key/value pairs. These key/value pairs can be pretty much anything you want (that is, any data type and any length up to 2<+>32<+> bytes), which gives you the flexibility to do what you want. Additional features include the ability to have multiple databases; keyed and sequential access to records; memory-mapped, read-only databases; and record locking.
You can download the whole package from the Sleepycat web site, http://www.sleepycat.com/. What you will find included are three packages, the data store, concurrent data store and transactional data store.
As we move through this list, we get more features, but that means more code space and more overhead. The basic data store only offers support for one writer and multiple readers. The concurrent data store adds support for multiple writers, while the transactional data store adds locking options and the code necessary to handle hot backups and disaster recovery. Again, the right choice depends on your application.
While the native language of Berkeley DB is C, interface libraries have been built for other programming languages including Perl, Python and Tcl.
If you still need more capabilities, open-source relational databases are available. Relational databases allow you to store data in uniform structures called tables. Each table has rows and columns. Each row represents one data set or record, and each column represents a particular field within the record. For example, a table of your friends might have their name, phone number and date of birth. The table might look like the one shown in Table 1.
The power of relational databases comes when you create multiple tables and are then able to associate them using a common column in each table. This operation is called a join. For example, let's say you had a second table that contained your friends' names and their e-mail addresses (see Table 2).
By performing a join operation on these two tables and selecting the Name, Phone and E-mail columns from the result, you could produce a list containing these three columns. While this is a very basic look at what a relational database is, I expect you get the idea of where this could lead.
All that is the good news. The bad news is that in order to realize the power of a relational database you have to learn a query language. The most popular of these languages is SQL, which stands for structured query language. Originally developed by IBM, ANSI and the International Standards Organization (ISO) published SQL standards in 1986 and 1987. Subsequent standards have been a joint effort of ANSI and ISO.
Besides entering SQL statements yourself to access data, applications program interfaces (APIs) are available for most other programming languages that allow you to access SQL databases.
One important capability of a relational database is called a transaction, which allows you to identify a set of database updates that are to be treated as one event. That is, you can be assured that either all of them will be performed or none of them performed. For example, in an accounting application, you need to guarantee that a debit is performed only if the related credit is also performed.
Unlike Berkeley DB, most relational databases contain separate client and server parts. What this means is that rather than linking the database system code with your program (like Berkeley DB), you only link with the client. The client code then uses a communications protocol such as TCP/IP to communicate with the server.
The advantage of this client/server approach is that you need not run the client and the server on the same computer system. Thus, this approach scales well. The disadvantage is the additional overhead of supporting the communications protocol. Again, the right choice will depend on your application.
Practical Task Scheduling Deployment
July 20, 2016 12:00 pm CDT
One of the best things about the UNIX environment (aside from being stable and efficient) is the vast array of software tools available to help you do your job. Traditionally, a UNIX tool does only one thing, but does that one thing very well. For example, grep is very easy to use and can search vast amounts of data quickly. The find tool can find a particular file or files based on all kinds of criteria. It's pretty easy to string these tools together to build even more powerful tools, such as a tool that finds all of the .log files in the /home directory and searches each one for a particular entry. This erector-set mentality allows UNIX system administrators to seem to always have the right tool for the job.
Cron traditionally has been considered another such a tool for job scheduling, but is it enough? This webinar considers that very question. The first part builds on a previous Geek Guide, Beyond Cron, and briefly describes how to know when it might be time to consider upgrading your job scheduling infrastructure. The second part presents an actual planning and implementation framework.
Join Linux Journal's Mike Diehl and Pat Cameron of Help Systems.
Free to Linux Journal readers.Register Now!
- Murat Yener and Onur Dundar's Expert Android Studio (Wrox)
- SUSE LLC's SUSE Manager
- My +1 Sword of Productivity
- Tech Tip: Really Simple HTTP Server with Python
- Non-Linux FOSS: Caffeine!
- Managing Linux Using Puppet
- Google's SwiftShader Released
- Parsing an RSS News Feed with a Bash Script
- Doing for User Space What We Did for Kernel Space
- SuperTuxKart 0.9.2 Released
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide