Simple Access Berkeley DB Using STLdb4

 in
STLdb4 makes C++ programming with the Berkeley DB simpler and more effective.
Storing Objects

One major difference between the Database class and an STL collection like std::map<> is that the key and value are not parameterized in Database. The main reason for this is that the items in a Database object are usually not in RAM but are read from disk on demand. Also, in order not to limit the functionality offered by Berkeley DB, the Database class has to support storing arbitrary data and not a heterogeneous collection of objects.

The illusion of stored objects can be created using implicit constructors and type conversion thin object wrappers. Shown in Listing 5, the Person class stores some information about people. The implicit constructor takes a DatabaseMutableValueRef, which is the class returned by the array operator in Database. A Person object is implicitly convertible to an std::string to enable it to be serialized to disk. As the main function shows, this thin wrapper makes it appear that the Database is storing Person objects.

Secondary Indexing

Sometimes the information that you are storing has multiple keys by which you would like to be able to find a given item quickly. For example, if you are storing contact information, you want to able to look up people based on either their name or e-mail address.

You could achieve the above by storing each person's information manually, using the name as the key and maintaining a second database from e-mail address to name. To find a person by e-mail address, you would use the e-mail-keyed database to find the name and then the name database to find the actual information. Maintaining indexes manually like this is highly error-prone, and moreover, the secondary indexes in Berkeley DB can do this housework for you automatically.

The above example can be implemented by having the primary key-value data stored with the person's name as the key and a secondary index on the e-mail address(es). This setup is shown in Figure 1. I refer to the database with the name-to-person data mapping as the main database and the e-mail look-up database as the secondary index.

Figure 1. A Secondary Index for Quick Look-Up by E-Mail Address

The main concern when using secondary indexing with STLdb4 is how to extract the secondary key from your data. There are some template functions in STLdb4 to help you with this. The getOffsetSecIdx() template takes an offset as its template argument and will return all the data from that offset to the end of an item as the secondary key. The getOffsetLengthSecIdx() is similar, but it allows you to specify both the offset and length of the secondary key data. Finally, the getOffsetNullTerminatedSecIdx() takes an offset and a string skip count to allow you to extract the nth null-terminated string after a given offset. For example, if you have five (32-bit) integer values followed by four null-terminated strings as your persistent format, you could use an offset of 20 and a skip of two to extract the third null-terminated string as your secondary index key.

Assuming the use of the Person class from Listing 5, the code in Listing 6 creates and uses a secondary index on the e-mail address for your Person objects. Because the disk format starts with our string data, when creating the extraction function with getOffsetNullTerminatedSecIdx(), I use an offset of zero and skip one null-terminated string (the name) to extract the e-mail address null-terminated string.

I then perform a partial look-up using the secondary index. The equal_range_partial() method finds both the lower and upper bound for partial key material. In this case, I find any e-mail addresses that begin with al. The output from the program is shown in Listing 7. Note that the first element of the iterator is the key from the secondary index, and the second element is the data from the main database. The key from the main database for this look-up is available through getPrimaryKey() on the iterator object.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

STLdb4 looks good but seems a pain to compile

Rich20B's picture

Having read the article STLdb4 looks very good, though trying to get it compiled and installed is a pain.

If anybody else is having problems then I've written my experiences of compiling on Debian Etch here: http://www.richardbishop.net/wpress/?p=23

Nice!

Anonymous's picture

After reading this article, I have been using STLdb4, and am very pleased with it! It could use a few touchups, but since you can
always get the raw BerkeleyDB pointer there's nothing I haven't been
able to accomplish yet. *SO* much nicer to work with than the existing C++ implementation. Good job!

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState