In-Memory Database Systems

IMDSes are especially useful for embedded development, where every saved process shrinks the footprint and the bottom line.
Transaction Processing

In the event of a catastrophic failure, such as loss of power, a disk-based database recovers by committing complete transactions or rolling back partial transactions from log files when the system restarts. Disk-based databases are hard-wired to keep transaction logs, to flush transaction log files and to cache to disk after transactions are committed.

Main memory databases also provide transactional integrity. To do this, the IMDS maintains a before image of the objects that are updated or deleted and a list of database pages added during a transaction. When the application commits the transaction, the memory for before images and page references returns to the memory pool (a fast and efficient process). If an in-memory database must abort a transaction (for example, if the inbound data stream is interrupted), the before images are restored to the database and the newly inserted pages are returned to the memory.

In the event of catastrophic failure, the in-memory database image is lost. This is a major difference from disk-based databases. If the system is turned off, the IMDS is reprovisioned upon restart. Consequently, there is no reason to keep transaction log files, and another complex, memory-intensive task is eliminated from the IMDS.

This functionality may not suit every application, but in the embedded systems arena, examples abound of applications with data stores that can be easily replenished in real time. These include a program guide application in a set-top box that downloads from a satellite or cable head-end, a wireless access point provisioned by a server upstream or an IP routing table that is repopulated as protocols discover network topology. Developers of such systems gladly limit the scope of transaction processing in exchange for superior performance and a smaller footprint.

This does not preclude the use of saved local data. With an IMDS, the application can open a stream (a socket, pipe or a file pointer) and instruct the database runtime to read or write a database image from or to the stream. This feature could be used to create and maintain boot-stage data, i.e., an initial starting point for the database. The other end of the stream can be a pipe to another process or a filesystem pointer (any filesystem, whether it's magnetic, optical or Flash).

Application Scenario: IP Routers

Where and how can IMDS technology make a difference? While in-memory databases have cropped up in various application settings, the following scenario, involving embedded software in the most common internet infrastructure device—the IP router, offers an idea of the problems this technology can address.

Modern IP routers incorporate routing table management (RTM) software that accomplishes the core task of determining the next hop for data packets on the Internet and other networks. Routing protocols continuously monitor available routes and the status of other routing devices, then update the device's routing table with current data.

These routing tables typically exist as proprietary outgrowths of the RTM software. This solution is one of the principal challenges in developing next-generation routers. As device functionality increases, routing table management presents a significant programming bottleneck. Lacking support for the complex data types and multiple access methods that are hallmarks of databases, self-developed routing table management (RTM) structures provide a limited toolset.

In addition, like any data management solution that is hard-wired to the application it supports, routing tables encounter difficulties in extensibility and reliability. Changes made to the data management code reverberate through the entire RTM structure, causing unwanted surprises and adding to QA cycles. Scalability is also an issue: self-developed data management that works well for a given task often stumbles when the intensity of use is ratcheted up. The result is that while the Internet's growth requires rapid advances in routing technology, this device evolution is slowed by software architecture that has outlived its usefulness.

Under such conditions, using a database would seem to be a no-brainer. But deploying a traditional DBMS within an IP router is problematic. Real-time internet address lookups won't accommodate the latency required to go to disk and perform the caching, transaction logging and other processes that are part and parcel of disk-based DBMSes.

In addition, imposing a large database footprint within the router necessitates more RAM and a more powerful CPU. This adds to the overall device cost, and the market for routers is price-competitive. Even a slightly lower per-unit price increases the manufacturer's market share, and a lower per-unit cost drops right to the bottom line. Software that saves RAM, or requires a less expensive processor, can determine product success.

The emergence of in-memory databases allows the application of DBMS technology to many embedded systems. For developers of embedded systems, proven database technology provides benefits including optimized access methods and data layout, standard and simplified navigation methods, built-in concurrency and data integrity mechanisms, and improved flexibility and fault tolerance. Adoption of this new breed of DBMS simplifies embedded system development while addressing growing software complexity and ensuring high availability and reliability.

Steve Graves is president and cofounder of McObject, developer of the eXtremeDB in-memory database system. As president of Raima Corporation, he helped pioneer the use of DBMS technology in embedded systems, working closely with companies in building database-enabled intelligent devices. A database industry veteran, Graves has held executive-level engineering, consulting and sales/marketing positions at several public and private technology companies.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Caching from other databases

Kanchana's picture

Nowadays, in memory database systems are being used to handle load for high volume web sites. Amazon uses IMDB technology to handle its massive database load from its web servers. IMDB systems are used for caching tables from traditional disk based systems such as Oracle, DB2, MySQL to improve the performance and maintain the freshness of the data (keep cache in sync with the actual database). Examples are Timesten and CSQL. Timesten provides caching for Oracle database and CSQL provides caching for Oracle, MySQL, Postgres, etc.

Re: In-Memory Database Systems

Anonymous's picture

What happens to In-Memory Database Systems when the database operation results in a large result (say a select operation which selects 5 million tuples from the databases)?

Are In-Memory Database Systems suited only for small databases ?

Re: In-Memory Database Systems

Anonymous's picture

64-bit versions of in-memory databases are less constrained than 32-bit versions, obviously. Another, often overlooked, concern is the amount of physical (versus virtual) memory in the system. An IMDS could have a 1 gigabyte database in a system with 2 gigabytes of memory, but if the size of all concurrent processes exceeds physical memory, the O/S will swap something out.

A 5 million tuple result is not, in itself, a problem. If the database must build a temporary table to create the result set, that could be a problem if available memory is limited.

Like their disk-based cousins, IMDS come in different forms. Your question pertains more to SQL databases. Some IMDS (my company's eXtremeDB, for example) are what are called "navigational", meaning the programmer navigates through the database one record at a time using the database programming interface. There are advantages and disadvantages to both types of programming interface. One advantage of the navigational API is that it is never necessary to build temporary tables because ad-hoc queries are non-existent with navigational APIs, by their nature. (In the absence of ad-hoc queries, where all access paths are known in advance, a well designed database with an SQL interface would also not require a temporary table for any query.)

-Steve

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix