Object Databases: Not Just for CAD/CAM Anymore
Let's take a look at an example of how easy it is to make things persistent in Texas. Sticking with tradition, we write the familiar “hello world” program, but with a persistent twist: we record how many times the program has been executed. Listing 1 shows the code for this task.
First we initialize the Texas library, passing it the argc and argv arguments from main. The program then opens up a persistent store named “hello.pstore” in the current working directory.
The persistent store is queried to see if a named root "COUNT" exists using the is_root() function. If the named root does not exist, allocation of a new integer takes place. The new integer is initialized to zero and named "COUNT" using the add_root() function. Otherwise, we retrieve the integer from the database. The counter is incremented and the results printed to standard output. All the dirty objects are committed and the database is closed. The library is uninitialized and the program exits.
With each successive run of the program, the integer named "COUNT" will be retrieved, incremented and rewritten to the database. You will notice this is all quite seamless: there are no explicit calls to queries, inserts, loads, or saves.
Next, we briefly explore how Texas swizzles pointers at page fault time and handles memory management. This is by no means a complete discussion of these topics. Readers interested in learning more about the Texas system should download the white papers and source code.
Texas uses conventional virtual memory access mechanisms to ensure the first access of any persistent page is intercepted by the Texas library. This page is loaded from the database and scanned for persistent pointers. Swizzling to in-memory addresses occurs on all persistent pointers on that page. All new pages are reserved and access protected. This faulting and reserving process repeats as the program traverses the object hierarchy of unloaded pages. The pages of virtual memory are reserved one step ahead of the actual referencing page. This implies the program can never see a swizzled pointer, only access protected pointers to unloaded objects. The Linux implementation uses the mprotect() system call to set up the access protection on the pages. An in-depth discussion of this topic can be found in the Texas white paper presented at the Fifth International Workshop on Persistent Object Systems [SIN92].
Texas allows you to access multiple databases, each with its own persistent heap. The standard transient heap and stack are also available for non-persistent memory allocation. Texas does not partition its address space into regions, allowing pages from different heaps to be interleaved within memory. Each page must belong only to a single heap, so Texas maintains separate free lists for each heap. A new page is created when the free list is empty or no free memory chunk available is large enough. New pages are partitioned into uniformly sized memory chunks large enough to hold the object being allocated. All of the other chunks are linked onto the free list. This uniform chunking of a page makes for trivial identification of the object headers on the page. Only the first header of a page needs to be examined to determine the size of all memory chunks on that page. The alignment of the other object's headers follows trivially. The object's header stores the schema information for the object so it can be identified and correctly swizzled.
While the Hello Persistent World program is not very exciting, it shows the minimal effort needed to make an object (an integer in this case) persistent. The next example demonstrates the power of object databases to capture the relationships between objects. This contrived example shows several many-to-many relationships. It also exposes some of the current deficiencies in the Texas library. The example is a system to track the many different research papers and books that clutter my office. See Figure 1 for a class diagram using non-unified Booch notation. The design file and the source code for both examples are available on my home page at www.qds.com/people/gmeinke.
The class diagram shows class PublishedWork, an abstract base class for all published material. It presents trivial methods for querying the object for its title, price, the number of pages, and a list of authors. The relationship between an Author and their PublishedWork is an example of a many-to-many relationship. The relationship between a Publisher and the Books they have published is one-to-many. Expressing these complex relationships in relational databases is awkward due to the foreign keys and the intermediate join table needed for the many-to-many relationships. By contrast, Texas handles these complex relationships with C++ containers and stores them directly in the object database. No compromises are necessary to the object design for foreign key data members.
|Happy Birthday Linux||Aug 25, 2016|
|ContainerCon Vendors Offer Flexible Solutions for Managing All Your New Micro-VMs||Aug 24, 2016|
|Updates from LinuxCon and ContainerCon, Toronto, August 2016||Aug 23, 2016|
|NVMe over Fabrics Support Coming to the Linux 4.8 Kernel||Aug 22, 2016|
|What I Wish I’d Known When I Was an Embedded Linux Newbie||Aug 18, 2016|
|Pandas||Aug 17, 2016|
- Happy Birthday Linux
- ContainerCon Vendors Offer Flexible Solutions for Managing All Your New Micro-VMs
- Updates from LinuxCon and ContainerCon, Toronto, August 2016
- New Version of GParted
- What I Wish I’d Known When I Was an Embedded Linux Newbie
- NVMe over Fabrics Support Coming to the Linux 4.8 Kernel
- Tor 0.2.8.6 Is Released
- All about printf
- Blender for Visual Effects
- A New Project for Linux at 25
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide