Embedding the db4o Object-Oriented Database

How to get started using this small-footprint object-oriented database in your embedded system programs.

Next, I create a trie, an indexing data structure specialized for searching text words. It is built as a series of nodes arranged in levels—each level holds a set of characters and associated pointers such that the characters on the topmost (or, root) level correspond to letters in a word's first character position; characters in the second level correspond to letters in the second character position, and so on. References associated with each character serve to “string” characters like beads on a thread, so that following a thread from the root down into the tree spells out the word being searched for.

If this is difficult to visualize, the illustration in Figure 1 should help.

Figure 1. A trie. In a trie index, individual characters within a word are stored at different node levels. This particular trie holds three words: as, ask and bet. The data pointers are actually references to the DictEntry objects associated with the corresponding words.

Inserting a new word into a trie is relatively simple. Starting with the first matching character, you examine the root node to see whether that character exists. If not, add it, and from that point on, the algorithm inserts new nodes (each initialized with a subsequent letter) as it works through the target word. If the character does exist, the algorithm follows the associated pointer to the next level, and the examination process repeats. Ultimately, you've accounted for each character in the word, and the node you're on is the node on which you attach the data reference.

Searching a trie is equally simple. Start at the root, and look for the first character. If the character is found, follow the associated reference to the next node; else, return a “not found” error. Otherwise, move to the next character, repeat, and if you get through the whole word, the data node associated with the terminal character points to the DictEntry object.

The code for the trie is shown in Listing 3.

As the code for inserting and searching both binary trees and tries illustrates, we can work with database objects almost as though they were purely in memory objects. Specifically, we can attach an object to an index simply by storing its object reference in the data reference element.

In addition, because the database makes no distinction between index objects and data objects, we need not create a separate index and data files. This keeps everything in one place, which is actually more of an advantage than one might first suppose.

Code for reading a text file with words and definitions, creating DictEntry objects and storing them in the database, and also building binary tree and trie indexes and attaching the DictEntry objects to them looks like this:


string theword;
string pronunciation;
int numdefs;
int partofspeech;
string definition;
DictEntry _dictEntry;

// Open a streamreader for the text file
FileInfo sourceFile = new FileInfo(textFilePath);
reader = sourceFile.OpenText();

// Open/create the database file
ObjectContainer db = Db4o.openFile(databaseFilePath);

// Create an empty Binary tree, and an empty trie
BinaryTree mybintree = new BinaryTree();
Trie mytrie = new Trie();

// Sit in an endless loop, reading text,
//  building objects, and putting those objects
//  in the database
while(true)
{
  // Read a word.
  // If we read a "#", then we're done.
  theword = ReadWord();
  if(theword.Equals("#")) break;

  // Read the pronunciation and put
  //  it in the object
  pronunciation = ReadPronunciation();
  _dictEntry = new DictEntry(theword, pronunciation);

  // Read the number of definitions
  numdefs = ReadNumOfDefs();

  // Loop through definitions. For each,
  //  read the part of speech and the
  //  definition, add it to the definition
  //  array.
  for(int i=0; i<numdefs; i++)
  {
    partofspeech = ReadPartOfSpeech();
    definition = ReadDef();
    Defn def = new Defn(partofspeech, definition);
    _dictEntry.add(def);
  }
  // We've read all of the definitions.
  // Put the DictEntry object into the
  //  database
  db.set(_dictEntry);

  // Now insert _dictEntry into the binary tree
  //  and the trie
  mybintree.insert(_dictEntry.TheWord, _dictEntry);
  mytrie.insert(_dictEntry.TheWord, _dictEntry);
}

// All done.
// Store the binary tree and the trie
db.set(mybintree);
db.set(mytrie);

// Commit everything
db.commit();

This, of course, presumes a number of helper methods for reading the source file, but the flow of logic is nonetheless apparent. Notice again that we were able to store each index—in entirety—simply by storing the root with a single call to db.set().

Fetching something from the database is only somewhat trickier. As much as we'd like to treat persistent objects identically to transient objects, we cannot. Objects on disk must be read into memory, and this requires an explicit fetch. The initial fetch is, of course, is a call to db.get() to locate the root of the index. So, code that allows us to search for a word using either the binary tree or the trie index would look like this:


public static void Main(string[] args)
{
  Object[] found;
  DictEntry _entry;

  // Verify proper number of arguments
  if(args.Length !=3)
  {
    Console.WriteLine("usage: SearchDictDatabase <database> B|T <word>");
    Console.WriteLine("<database> = path to db4o database");
    Console.WriteLine("B = use binary tree; T = use trie");
    Console.WriteLine("<word> = word to search for");
    return;
  }

  // Verify 2nd argument
  if("BT".IndexOf(args[1])==-1)
  {
    Console.WriteLine("2nd argument must be B or T");
    return;
  }

  // Open the database file
  ObjectContainer db = Db4o.openFile(args[0]);
  if(db!=null) Console.WriteLine("Open OK");

  // Switch on the 2nd argument (B or T)
  if("BT".IndexOf(args[1])==0)
  { // Search binary tree
    // Create an empty binary tree object for the
    //  search template
    BinaryTree btt = new BinaryTree();
    ObjectSet result = db.get(btt);
    BinaryTree bt = (BinaryTree) result.next();

    // Now search for the results
    found = bt.search(args[2],db);
  }
  else
  { // Search trie
    // Create an empty trie object fore the search
    //  template
    Trie triet = new Trie();
    ObjectSet result = db.get(triet);
    Trie mytrie = (Trie) result.next();

    // Now search for the results
    found = mytrie.search(args[2],db);
  }

  // Close the database
  db.close();

  // Was it in the database?
  if(found == null)
  {
    Console.WriteLine("Not found");
    return;
  }

  // Fetch the DictEntry
  _entry = (DictEntry)found[0];

... <Do stuff with _entry here> ...

And now we can explain the purpose of the calls to db.activate() in the search methods of both Listings 1 and 3.

When you call the db.set() method, as we explained, the db4o engine spiders through the object tree, persisting all reachable objects. (This is known as persistence by reachability.) In the reverse direction—that is, calling db.get() to fetch an object—db4o does not pull the entire object tree out of the database. If it did that, then fetching the root of, for example, the binary index, would cause db4o to pull the entire index, plus all the dictionary entries, plus all the definitions into memory at once—not very efficient if we want only one word.

Instead, db4o uses a concept called activation depth. Suppose I've fetched object A into memory from a db4o database using a db.get() call. If I then call db.activate(A,6), that tells db4o also to fetch into memory all objects referenced by A, up to a depth of 6. So, the db.activate() calls that are sprinkled throughout the search routines of the binary tree and the trie classes ensure that the search operation always pulls in enough of the index so that the search can proceed. (And, at the end of a successful search, the dictionary objects are fetched.)

______________________

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix