Simple Access Berkeley DB Using STLdb4

STLdb4 makes C++ programming with the Berkeley DB simpler and more effective.
Storing Objects

One major difference between the Database class and an STL collection like std::map<> is that the key and value are not parameterized in Database. The main reason for this is that the items in a Database object are usually not in RAM but are read from disk on demand. Also, in order not to limit the functionality offered by Berkeley DB, the Database class has to support storing arbitrary data and not a heterogeneous collection of objects.

The illusion of stored objects can be created using implicit constructors and type conversion thin object wrappers. Shown in Listing 5, the Person class stores some information about people. The implicit constructor takes a DatabaseMutableValueRef, which is the class returned by the array operator in Database. A Person object is implicitly convertible to an std::string to enable it to be serialized to disk. As the main function shows, this thin wrapper makes it appear that the Database is storing Person objects.

Secondary Indexing

Sometimes the information that you are storing has multiple keys by which you would like to be able to find a given item quickly. For example, if you are storing contact information, you want to able to look up people based on either their name or e-mail address.

You could achieve the above by storing each person's information manually, using the name as the key and maintaining a second database from e-mail address to name. To find a person by e-mail address, you would use the e-mail-keyed database to find the name and then the name database to find the actual information. Maintaining indexes manually like this is highly error-prone, and moreover, the secondary indexes in Berkeley DB can do this housework for you automatically.

The above example can be implemented by having the primary key-value data stored with the person's name as the key and a secondary index on the e-mail address(es). This setup is shown in Figure 1. I refer to the database with the name-to-person data mapping as the main database and the e-mail look-up database as the secondary index.

Figure 1. A Secondary Index for Quick Look-Up by E-Mail Address

The main concern when using secondary indexing with STLdb4 is how to extract the secondary key from your data. There are some template functions in STLdb4 to help you with this. The getOffsetSecIdx() template takes an offset as its template argument and will return all the data from that offset to the end of an item as the secondary key. The getOffsetLengthSecIdx() is similar, but it allows you to specify both the offset and length of the secondary key data. Finally, the getOffsetNullTerminatedSecIdx() takes an offset and a string skip count to allow you to extract the nth null-terminated string after a given offset. For example, if you have five (32-bit) integer values followed by four null-terminated strings as your persistent format, you could use an offset of 20 and a skip of two to extract the third null-terminated string as your secondary index key.

Assuming the use of the Person class from Listing 5, the code in Listing 6 creates and uses a secondary index on the e-mail address for your Person objects. Because the disk format starts with our string data, when creating the extraction function with getOffsetNullTerminatedSecIdx(), I use an offset of zero and skip one null-terminated string (the name) to extract the e-mail address null-terminated string.

I then perform a partial look-up using the secondary index. The equal_range_partial() method finds both the lower and upper bound for partial key material. In this case, I find any e-mail addresses that begin with al. The output from the program is shown in Listing 7. Note that the first element of the iterator is the key from the secondary index, and the second element is the data from the main database. The key from the main database for this look-up is available through getPrimaryKey() on the iterator object.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

STLdb4 looks good but seems a pain to compile

Rich20B's picture

Having read the article STLdb4 looks very good, though trying to get it compiled and installed is a pain.

If anybody else is having problems then I've written my experiences of compiling on Debian Etch here:


Anonymous's picture

After reading this article, I have been using STLdb4, and am very pleased with it! It could use a few touchups, but since you can
always get the raw BerkeleyDB pointer there's nothing I haven't been
able to accomplish yet. *SO* much nicer to work with than the existing C++ implementation. Good job!