Object Databases: Not Just for CAD/CAM Anymore

As Esther Dyson put it, “Using tables to store objects is like driving your car home and then disassembling it to put it in the garage. It can be assembled again in the morning, but one eventually asks whether this is the most efficient way to park a car.”
To Swizzle or Not to Swizzle

Object databases come in two models. One requires you to inherit from a vendor-supplied persistent base class, a la the Object Database Management Group (ODMG) standard [CAT96]. The persistent base class provides the interface for making database requests of the objects. The other model is a pointer swizzling technique that allows you to use the pointers to persistent objects as if they never left memory. I believe the pointer swizzling technique is superior in programming model and flexibility, and I will cover this technique in further detail.

Pointer swizzling is the changing or mapping of the on-disk format pointer to the in-memory format pointer. Swizzling of pointers takes place transparently to the client program. When the program uses a pointer to an unloaded object, a segmentation violation occurs. The vendor library traps that violation and fetches the object from the database. It then sets the pointer to the newly loaded object and returns control to the client program. The client program is totally unaware that a database access occurred.

The use of standard C++ memory management techniques allow the same application code to work on both transient and persistent objects. Objects are constructed using the C++ placement new operator. Allocation in persistent memory implicitly stores the object in the database. Removing objects from the database is as simple as calling the C++ delete operator.

The Texas Persistent Store

The Texas Persistent Store is a public domain pointer swizzling object database for C++. Texas was created and is maintained by the University of Texas at Austin. The current 0.4 beta release supports the Linux 1.2.9, Solaris 2.4, SunOS 4.1.3, and DEC Ultrix 4.2 platforms, all using the GNU g++ 2.5.8 or g++ 2.6.3 compiler. It also supports OS/2 2.1 using the IBM CSet compiler and the Sun 3.0.1 C++ compiler. White papers and the source are available via anonymous ftp from cs.utexas.edu, in the directory /pub/garbage, or from the OOPS Research Group's home page at www.cs.utexas.edu/users/oops.

My setup consists of Slackware 1.2.8 running on a 486/100 with 16Mb of memory. Texas installed and ran on my Linux machine with minimal hassle. Due to a compiler template bug in g++ 2.6.3, you must patch the compiler or modify the makefiles to use the -fexternal_templates compiler switch. The documentation describes both the bug and the fixes, making the library installation fairly painless. Texas comes with a few test programs and examples to ensure the system is performing correctly.

Texas Features

To start coding using the Texas library, you have to understand only four easy features: the initialization macros, opening and closing the persistent stores, finding and creating named roots, and allocating objects into persistent memory. Here, I discuss each of these features briefly and then jump in and look at some code.

Initialization of the Texas library takes place by invoking the TEXAS_MAIN_INIT() macro. This macro sets up the signal handler, reading in the schema information and virtual function tables. The TEXAS_MAIN_UNINIT() macro removes signal handlers and resets the system to its previous state.

Use the open_pstore() function to open a database. If the file does not exist, the database is created and opened. Opening a database starts a transaction. You can manipulate the transactions during the lifetime of the program by calling commit_transaction() or abort_transaction(). commit_transaction() will save all of the current persistent objects to disk and start a new transaction, while abort_transaction() throws away all of the dirty pages and starts a new transaction. To close the database use the close_pstore() function. This implicitly calls commit_transaction() and closes the file database. If you do not want to commit the current work you can call the close_pstore_without_commit() function.

Named roots are your entry points for retrieving the persistent objects from the database. They provide the mechanism by which a program can directly navigate to objects or search containers for objects. You create a named root by using the add_root() function. A named root is retrieved with the get_root() call and the database is queried for the existence of a named root with the is_root() function.

The Texas memory allocation macros, pnew() and pnew_array() hide the C++ placement operator new. The allocation macros also hide the instantiation of the TexasWrapper template classes. The TexasWrapper class handles the creation and registration of schema information with the database. The schema information holds the layout of the class attributes while in the database.