Simple Access Berkeley DB Using STLdb4
The Berkeley DB library provides a solid implementation of both the B-Tree and Hash file structures. The implementation includes support for transactions, concurrent access of database files from multiple processes, and secondary indexing as well as logging and recovery.
In this article, I use the term database to refer to a B-Tree or Hash maintained by Berkeley DB. These databases allow rapid key to value look-ups.
The standard distribution of Berkeley DB comes with both a C and C++ API. Unfortunately, the standard Berkeley DB C++ API is a very thin wrapper neglecting modern C++ designs, such as smart pointers, standard C++ I/O streams, iterators, default arguments, operator overloading and so on. As a concrete example of the lack of reference counted smart pointers, the Berkeley DB API for Db::get(), shown in Listing 1, includes two Dbt pointers and the ownership of the memory for these is not immediately obvious.
Listing 1. Standard Berkeley DB C++ API Db::get()
#include <db_cxx.h>
int Db::get(DbTxn *txnid, Dbt *key, Dbt *data,
u_int32_t flags);The STLdb4 Project was created to make using the Berkeley DB from C++ easier. The STLdb4 API aims to make simple database interaction trivial while still keeping more advanced usage simple. A Berkeley DB object behaves similarly to an STL collection allowing look-ups and the setting of elements using an overloaded array operator. A full example program is shown in Listing 2. After execution, the file named with argv[1] will contain a Berkeley DB B-Tree file containing the foo-bar data pair.
The main class is the Database and the reference counted smart pointer for this class is called fh_database. This trend is used throughout STLdb4 where the smart pointer for Foo is called fh_foo. Databases can be opened either as in Listing 2 directly in the constructor or using the empty constructor and the open() or create() methods later. The main difference between open and create is that create requires a database type (for example B-Tree or Hash) and will create a new database at the given path if none exists already.
In the example in Listing 2, I don't have to close the database explicitly, because the smart pointer to the Database object will handle that for me.
Listing 2. STLdb4 Setting and Getting Values
#include <iostream>
#include <STLdb4/stldb4.hh>
using namespace STLdb4;
using namespace std;
int main( int argc, char** argv )
{
fh_database db = new Database(DB_BTREE, argv[1]);
db["foo"] = "bar";
cerr << "foo is set to:" << db["foo"] << endl;
return 0;
}Standard STL collection methods, such as empty(), size(), insert(), erase(), count(), begin(), end(), find(), upper_bound() and lower_bound(), all exist in the Database class. There are also partial versions of the latter three methods. The partial versions allow the looking up of entries with part of a key in B-Tree files. A bidirectional iterator object is returned by many of the above methods.
When storing large values in the database, using the standard I/O streams can be more efficient than using the get() method or overloaded array operator. This is because the standard I/O streams use partial read and write operations on the underlying Berkeley DB file. A standard I/O stream is obtained using the getIStream() and getIOStream() methods of the Database class.
The example in Listing 3 shows the standard C++ I/O stream interface for STLdb4. The housekeeping of performing partial I/O to the Berkeley DB file is handled by STLdb4. Accessing large chunks of data through this API maintains a low memory consumption. The API shows one of the used getIOStream() calls as having a ferris_ios first parameter. As the libferrisstreams library that STLdb4 uses offers generic I/O stream support, the ferris_ios is a backward-compatible extension of the std::ios bitfield. The extension allows specifying such things as memory mapped backing and sequential stream access to be nominated for use where supported. The output from running this example is shown in Listing 4.
Listing 3. Standard C++ I/O Streams for Berkeley DB Files
#include <iostream>
#include <STLdb4/stldb4.hh>
using namespace STLdb4;
using namespace Ferris;
using namespace std;
int main( int, char** )
{
fh_database db = new Database( DB_BTREE,
"/tmp/play.db" );
string data = "1234567890";
db[ "fred" ] = data;
cerr << "Initial value:" << db["fred"] << endl;
{
fh_iostream ss = db->getIOStream( "fred" );
ss << "54321";
}
cerr << "Second value:" << db["fred"] << endl;
{
fh_iostream ss = db->getIOStream( "fred" );
ss.seekp( 3 );
ss << "AAAA";
}
cerr << "post seekp value:" << db["fred"] << endl;
// truncate the iostream and write
{
Database::iterator di = db->find( "fred" );
fh_iostream oss = di.getIOStream(ios::trunc, 0);
oss << "sm";
}
cerr << "Trunc and write:" << db["fred"] << endl;
// append some more data to end of iostream
{
fh_iostream oss = db->find( "fred" )
.getIOStream( ios::ate, 0 );
oss << "AndMore";
}
cerr << "at end write value:"
<< db["fred"] << endl;
return 0;
}
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Validate an E-Mail Address with PHP, the Right Way
- RSS Feeds
- Introduction to MapReduce with Hadoop on Linux
- Weechat, Irssi's Little Brother
- New Products
- Developer Poll
- Reply to comment | Linux Journal
2 hours 25 min ago - Reply to comment | Linux Journal
3 hours 10 min ago - Didn't read
3 hours 20 min ago - Reply to comment | Linux Journal
3 hours 25 min ago - Poul-Henning Kamp: welcome to
5 hours 35 min ago - This has already been done
5 hours 36 min ago - Reply to comment | Linux Journal
6 hours 21 min ago - Welcome to 1998
7 hours 10 min ago - notifier shortcomings
7 hours 34 min ago - heroku?
9 hours 11 min ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
STLdb4 looks good but seems a pain to compile
Having read the article STLdb4 looks very good, though trying to get it compiled and installed is a pain.
If anybody else is having problems then I've written my experiences of compiling on Debian Etch here: http://www.richardbishop.net/wpress/?p=23
Nice!
After reading this article, I have been using STLdb4, and am very pleased with it! It could use a few touchups, but since you can
always get the raw BerkeleyDB pointer there's nothing I haven't been
able to accomplish yet. *SO* much nicer to work with than the existing C++ implementation. Good job!