In-Memory Database Systems

IMDSes are especially useful for embedded development, where every saved process shrinks the footprint and the bottom line.

Growth in intelligent connected devices is soaring. Whether in the home, the pocket or built into industrial communications and transportation systems, such gear has evolved to include powerful CPUs and sophisticated embedded systems software. One type of software increasingly seen within such devices is the database management system (DBMS). While familiar on desktops and servers, databases are a recent arrival to embedded systems. Like any organism dropped into a new environment, databases must evolve. A new type of DBMS, the in-memory database system (IMDS), represents the latest step in DBMSes' adaptation to embedded systems.

Why are embedded systems developers turning to databases? Market competition requires that devices like set-top boxes, network switches and consumer electronics become “smarter”. To support expanding feature sets, applications generally must manage larger volumes of more complex data. As a result, many device developers find they are outgrowing self-developed data management solutions, which can be especially difficult to maintain and extend as application requirements increase.

In addition, the trend toward standard, commercial off-the-shelf (COTS) embedded operating systems—and away from a fragmented environment of many proprietary systems—promotes the availability of databases. The emergence of a widely used OS such as embedded Linux creates a user community, which in turn spurs development (both commercially and noncommercially) of databases and other tools to enhance the platform.

So device developers are turning to commercial databases, but existing embedded DBMS software has not provided the ideal fit. Embedded databases emerged well over a decade ago to support business systems, with features including complex caching logic and abnormal termination recovery. But on a device, within a set-top box or next-generation fax machine, for example, these abilities are often unnecessary and cause the application to exceed available memory and CPU resources.

In addition, traditional databases are built to store data on disk. Disk I/O, as a mechanical process, is tremendously expensive in terms of performance. This often makes traditional databases too slow for embedded systems that require real-time performance.

In-memory databases have emerged specifically to meet the performance needs and resource availability in embedded systems. As the name implies, IMDSes reside entirely in memory—they never go to disk.

So is an IMDS simply a traditional database that's been loaded into memory? That's a fair question because disk I/O elimination is the best-known aspect of this new technology. The capability to create a RAM disk, a filesystem in memory, is built into Linux. Wouldn't deploying a well-known database system, such as MySQL or even Oracle, on such a disk provide the same benefits?

In fact, IMDSes are considerably different beasts from their embedded DBMS cousins. Compared to traditional databases, IMDSes are less complex. Beyond the elimination of disk I/O, in-memory database systems have fewer moving parts or interacting processes. This leads to greater frugality in RAM and CPU use and faster overall responsiveness than can be achieved by deploying a traditional DBMS in memory. An understanding of what's been designed out of, or significantly modified in, IMDSes is important in deciding whether such a technology suits a given project. Three key differences are described below.

Caching

Due to the performance drain caused by physical disk access, virtually all traditional DBMS software incorporates caching to keep the most recently used portions of the database in memory. Caching logic includes cache synchronization, which makes sure that an image of a database page in cache is consistent with the physical database page on disk. Cache lookup also is included, which determines if data requested by the application is in cache; if not, the page is retrieved and added to the cache for future reference.

These processes play out regardless of whether a disk-based DBMS is deployed in memory, such as on a RAM disk. By eliminating caching, IMDS databases remove a significant source of complexity and performance overhead, and in the process slim down the RAM and CPU requirements of the IMDS.

Data-Transfer Overhead

Consider the handoffs required for an application to read a piece of data from a traditional disk-based database, modify it and write that piece of data back to the database. The process is illustrated in Figure 1.

  1. The application requests the data item from the database runtime through the database API.

  2. The database runtime instructs the filesystem to retrieve the data from the physical media.

  3. The filesystem makes a copy of the data for its cache and passes another copy to the database.

  4. The database keeps one copy in its cache and passes another copy to the application.

  5. The application modifies its copy and passes it back to the database through the database API.

  6. The database runtime copies the modified data item back to database cache.

  7. The copy in the database cache is eventually written to the filesystem, where it is updated in the filesystem cache.

  8. Finally, the data is written back to the physical media.

These steps cannot be turned off in a traditional database, even when processing takes place entirely within memory. And this simplified scenario doesn't account for the additional copies and transfers required for transaction logging!

Figure 1. Data flow in a traditional DBMS. Red lines represent data transfer. Gray lines represent message path.

In contrast, an in-memory database system entails little or no data transfer. The application may make copies of the data in local program variables, but it is not required. Instead, the IMDS gives the application a pointer that refers directly to the data item in the database, enabling the application to work with the data directly. The data is still protected because the pointer is used only through the database API, which insures that it is used properly. Elimination of multiple data transfers streamlines processing. Cutting multiple data copies reduces memory consumption, and the simplicity of this design makes for greater reliability.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Caching from other databases

Kanchana's picture

Nowadays, in memory database systems are being used to handle load for high volume web sites. Amazon uses IMDB technology to handle its massive database load from its web servers. IMDB systems are used for caching tables from traditional disk based systems such as Oracle, DB2, MySQL to improve the performance and maintain the freshness of the data (keep cache in sync with the actual database). Examples are Timesten and CSQL. Timesten provides caching for Oracle database and CSQL provides caching for Oracle, MySQL, Postgres, etc.

Re: In-Memory Database Systems

Anonymous's picture

What happens to In-Memory Database Systems when the database operation results in a large result (say a select operation which selects 5 million tuples from the databases)?

Are In-Memory Database Systems suited only for small databases ?

Re: In-Memory Database Systems

Anonymous's picture

64-bit versions of in-memory databases are less constrained than 32-bit versions, obviously. Another, often overlooked, concern is the amount of physical (versus virtual) memory in the system. An IMDS could have a 1 gigabyte database in a system with 2 gigabytes of memory, but if the size of all concurrent processes exceeds physical memory, the O/S will swap something out.

A 5 million tuple result is not, in itself, a problem. If the database must build a temporary table to create the result set, that could be a problem if available memory is limited.

Like their disk-based cousins, IMDS come in different forms. Your question pertains more to SQL databases. Some IMDS (my company's eXtremeDB, for example) are what are called "navigational", meaning the programmer navigates through the database one record at a time using the database programming interface. There are advantages and disadvantages to both types of programming interface. One advantage of the navigational API is that it is never necessary to build temporary tables because ad-hoc queries are non-existent with navigational APIs, by their nature. (In the absence of ad-hoc queries, where all access paths are known in advance, a well designed database with an SQL interface would also not require a temporary table for any query.)

-Steve

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState