In-Memory Database Systems
Growth in intelligent connected devices is soaring. Whether in the home, the pocket or built into industrial communications and transportation systems, such gear has evolved to include powerful CPUs and sophisticated embedded systems software. One type of software increasingly seen within such devices is the database management system (DBMS). While familiar on desktops and servers, databases are a recent arrival to embedded systems. Like any organism dropped into a new environment, databases must evolve. A new type of DBMS, the in-memory database system (IMDS), represents the latest step in DBMSes' adaptation to embedded systems.
Why are embedded systems developers turning to databases? Market competition requires that devices like set-top boxes, network switches and consumer electronics become “smarter”. To support expanding feature sets, applications generally must manage larger volumes of more complex data. As a result, many device developers find they are outgrowing self-developed data management solutions, which can be especially difficult to maintain and extend as application requirements increase.
In addition, the trend toward standard, commercial off-the-shelf (COTS) embedded operating systems—and away from a fragmented environment of many proprietary systems—promotes the availability of databases. The emergence of a widely used OS such as embedded Linux creates a user community, which in turn spurs development (both commercially and noncommercially) of databases and other tools to enhance the platform.
So device developers are turning to commercial databases, but existing embedded DBMS software has not provided the ideal fit. Embedded databases emerged well over a decade ago to support business systems, with features including complex caching logic and abnormal termination recovery. But on a device, within a set-top box or next-generation fax machine, for example, these abilities are often unnecessary and cause the application to exceed available memory and CPU resources.
In addition, traditional databases are built to store data on disk. Disk I/O, as a mechanical process, is tremendously expensive in terms of performance. This often makes traditional databases too slow for embedded systems that require real-time performance.
In-memory databases have emerged specifically to meet the performance needs and resource availability in embedded systems. As the name implies, IMDSes reside entirely in memory—they never go to disk.
So is an IMDS simply a traditional database that's been loaded into memory? That's a fair question because disk I/O elimination is the best-known aspect of this new technology. The capability to create a RAM disk, a filesystem in memory, is built into Linux. Wouldn't deploying a well-known database system, such as MySQL or even Oracle, on such a disk provide the same benefits?
In fact, IMDSes are considerably different beasts from their embedded DBMS cousins. Compared to traditional databases, IMDSes are less complex. Beyond the elimination of disk I/O, in-memory database systems have fewer moving parts or interacting processes. This leads to greater frugality in RAM and CPU use and faster overall responsiveness than can be achieved by deploying a traditional DBMS in memory. An understanding of what's been designed out of, or significantly modified in, IMDSes is important in deciding whether such a technology suits a given project. Three key differences are described below.
Due to the performance drain caused by physical disk access, virtually all traditional DBMS software incorporates caching to keep the most recently used portions of the database in memory. Caching logic includes cache synchronization, which makes sure that an image of a database page in cache is consistent with the physical database page on disk. Cache lookup also is included, which determines if data requested by the application is in cache; if not, the page is retrieved and added to the cache for future reference.
These processes play out regardless of whether a disk-based DBMS is deployed in memory, such as on a RAM disk. By eliminating caching, IMDS databases remove a significant source of complexity and performance overhead, and in the process slim down the RAM and CPU requirements of the IMDS.
Consider the handoffs required for an application to read a piece of data from a traditional disk-based database, modify it and write that piece of data back to the database. The process is illustrated in Figure 1.
The application requests the data item from the database runtime through the database API.
The database runtime instructs the filesystem to retrieve the data from the physical media.
The filesystem makes a copy of the data for its cache and passes another copy to the database.
The database keeps one copy in its cache and passes another copy to the application.
The application modifies its copy and passes it back to the database through the database API.
The database runtime copies the modified data item back to database cache.
The copy in the database cache is eventually written to the filesystem, where it is updated in the filesystem cache.
Finally, the data is written back to the physical media.
These steps cannot be turned off in a traditional database, even when processing takes place entirely within memory. And this simplified scenario doesn't account for the additional copies and transfers required for transaction logging!
In contrast, an in-memory database system entails little or no data transfer. The application may make copies of the data in local program variables, but it is not required. Instead, the IMDS gives the application a pointer that refers directly to the data item in the database, enabling the application to work with the data directly. The data is still protected because the pointer is used only through the database API, which insures that it is used properly. Elimination of multiple data transfers streamlines processing. Cutting multiple data copies reduces memory consumption, and the simplicity of this design makes for greater reliability.