Embedding the db4o Object-Oriented Database
Listing 2. TreeNode Class
/*
* TreeNode
*/
using System;
namespace PersistentTrees
{
/// <summary>
/// Description of TreeNode.
/// </summary>
public class TreeNode
{
public TreeNode()
{
}
private TreeNode left; // Left child
private TreeNode right; // Right child
private string key; // Key for this node
private Object[] data; // Data associated with key
// Create a new TreeNode, loaded with
// key and data.
public TreeNode(string _key, Object _data)
{
left = null;
right = null;
key = _key;
data = new Object[1];
data[0] = _data;
}
// addData
// Adds new data item to an existing node.
// The array is extended.
public void addData(Object _data)
{
Object[] newdata = new Object[data.Length+1];
Array.Copy(data,0,newdata,0,data.Length);
newdata[data.Length]=_data;
data = newdata;
}
// Property access
public TreeNode Left
{
get { return left; }
set { left = value; }
}
public TreeNode Right
{
get { return right; }
set { right = value; }
}
public string Key
{
get { return key; }
set { key = value; }
}
public Object[] getData()
{
return data;
}
}
}
Next, I create a trie, an indexing data structure specialized for searching text words. It is built as a series of nodes arranged in levels—each level holds a set of characters and associated pointers such that the characters on the topmost (or, root) level correspond to letters in a word's first character position; characters in the second level correspond to letters in the second character position, and so on. References associated with each character serve to “string” characters like beads on a thread, so that following a thread from the root down into the tree spells out the word being searched for.
If this is difficult to visualize, the illustration in Figure 1 should help.

Figure 1. A trie. In a trie index, individual characters within a word are stored at different node levels. This particular trie holds three words: as, ask and bet. The data pointers are actually references to the DictEntry objects associated with the corresponding words.
Inserting a new word into a trie is relatively simple. Starting with the first matching character, you examine the root node to see whether that character exists. If not, add it, and from that point on, the algorithm inserts new nodes (each initialized with a subsequent letter) as it works through the target word. If the character does exist, the algorithm follows the associated pointer to the next level, and the examination process repeats. Ultimately, you've accounted for each character in the word, and the node you're on is the node on which you attach the data reference.
Searching a trie is equally simple. Start at the root, and look for the first character. If the character is found, follow the associated reference to the next node; else, return a “not found” error. Otherwise, move to the next character, repeat, and if you get through the whole word, the data node associated with the terminal character points to the DictEntry object.
The code for the trie is shown in Listing 3.
Listing 3. Trie
/*
* Trie
*/
using System;
using com.db4o;
namespace PersistentTrees
{
/// <summary>
/// Description of Trie.
/// </summary>
/// trie class
public class Trie
{
private TriePnode root; // Root of Trie
// Constructor
public Trie()
{
root = null;
}
// insert
// Insert a key/data pair into the tree.
// Allows duplicates
public void insert(string key, // Key to insert
Object data) // Data assoc. with key
{
TriePnode t = root;
TriePnode parent = null;
int index=0;
int slen = key.Length;
for(int i=0; i< slen; i++)
{
char c = key[i];
// If a node doesn't exist -- create it
if(t == null) t = new TriePnode();
// If this is the first node of the tree,
// it is the
// root. Otherwise, it is stored in the
// pnodes array
// of the parent
if(i==0)
root = t;
else
parent.setPnodePointer(index, t);
// If the character is not on the node,
// add it
if((index=t.isCharOnNode(c))==-1)
index = t.addKeyChar(c);
if(i == slen-1) break;
parent = t;
t = t.getPnodePointer(index);
}
// Finally, add the data item
t.addData(index, data);
}
// search
// Searches for a string in the trie.
// If found, returns the Object[] data array associated.
// Else, returns null
// db is the ObjectContainer holding the trie
public Object[] search(string _key,
ObjectContainer db)
{
TriePnode t;
char c;
int index=0;
// Empty trie?
if((t=root)==null) return(null);
int slen = _key.Length;
for(int i=0; i<slen; i++)
{
c = _key[i];
if((index=t.isCharOnNode(c))==-1)return(null);
if(i==slen-1) break;
db.activate(t,2);
t = t.getPnodePointer(index);
}
// Get the data
db.activate(t,3);
return(t.getDnodePointers(index).getData());
}
}
}
As the code for inserting and searching both binary trees and tries illustrates, we can work with database objects almost as though they were purely in memory objects. Specifically, we can attach an object to an index simply by storing its object reference in the data reference element.
In addition, because the database makes no distinction between index objects and data objects, we need not create a separate index and data files. This keeps everything in one place, which is actually more of an advantage than one might first suppose.
Code for reading a text file with words and definitions, creating DictEntry objects and storing them in the database, and also building binary tree and trie indexes and attaching the DictEntry objects to them looks like this:
string theword;
string pronunciation;
int numdefs;
int partofspeech;
string definition;
DictEntry _dictEntry;
// Open a streamreader for the text file
FileInfo sourceFile = new FileInfo(textFilePath);
reader = sourceFile.OpenText();
// Open/create the database file
ObjectContainer db = Db4o.openFile(databaseFilePath);
// Create an empty Binary tree, and an empty trie
BinaryTree mybintree = new BinaryTree();
Trie mytrie = new Trie();
// Sit in an endless loop, reading text,
// building objects, and putting those objects
// in the database
while(true)
{
// Read a word.
// If we read a "#", then we're done.
theword = ReadWord();
if(theword.Equals("#")) break;
// Read the pronunciation and put
// it in the object
pronunciation = ReadPronunciation();
_dictEntry = new DictEntry(theword, pronunciation);
// Read the number of definitions
numdefs = ReadNumOfDefs();
// Loop through definitions. For each,
// read the part of speech and the
// definition, add it to the definition
// array.
for(int i=0; i<numdefs; i++)
{
partofspeech = ReadPartOfSpeech();
definition = ReadDef();
Defn def = new Defn(partofspeech, definition);
_dictEntry.add(def);
}
// We've read all of the definitions.
// Put the DictEntry object into the
// database
db.set(_dictEntry);
// Now insert _dictEntry into the binary tree
// and the trie
mybintree.insert(_dictEntry.TheWord, _dictEntry);
mytrie.insert(_dictEntry.TheWord, _dictEntry);
}
// All done.
// Store the binary tree and the trie
db.set(mybintree);
db.set(mytrie);
// Commit everything
db.commit();
This, of course, presumes a number of helper methods for reading the source file, but the flow of logic is nonetheless apparent. Notice again that we were able to store each index—in entirety—simply by storing the root with a single call to db.set().
Fetching something from the database is only somewhat trickier. As much as we'd like to treat persistent objects identically to transient objects, we cannot. Objects on disk must be read into memory, and this requires an explicit fetch. The initial fetch is, of course, is a call to db.get() to locate the root of the index. So, code that allows us to search for a word using either the binary tree or the trie index would look like this:
public static void Main(string[] args)
{
Object[] found;
DictEntry _entry;
// Verify proper number of arguments
if(args.Length !=3)
{
Console.WriteLine("usage: SearchDictDatabase <database> B|T <word>");
Console.WriteLine("<database> = path to db4o database");
Console.WriteLine("B = use binary tree; T = use trie");
Console.WriteLine("<word> = word to search for");
return;
}
// Verify 2nd argument
if("BT".IndexOf(args[1])==-1)
{
Console.WriteLine("2nd argument must be B or T");
return;
}
// Open the database file
ObjectContainer db = Db4o.openFile(args[0]);
if(db!=null) Console.WriteLine("Open OK");
// Switch on the 2nd argument (B or T)
if("BT".IndexOf(args[1])==0)
{ // Search binary tree
// Create an empty binary tree object for the
// search template
BinaryTree btt = new BinaryTree();
ObjectSet result = db.get(btt);
BinaryTree bt = (BinaryTree) result.next();
// Now search for the results
found = bt.search(args[2],db);
}
else
{ // Search trie
// Create an empty trie object fore the search
// template
Trie triet = new Trie();
ObjectSet result = db.get(triet);
Trie mytrie = (Trie) result.next();
// Now search for the results
found = mytrie.search(args[2],db);
}
// Close the database
db.close();
// Was it in the database?
if(found == null)
{
Console.WriteLine("Not found");
return;
}
// Fetch the DictEntry
_entry = (DictEntry)found[0];
... <Do stuff with _entry here> ...
And now we can explain the purpose of the calls to db.activate() in the search methods of both Listings 1 and 3.
When you call the db.set() method, as we explained, the db4o engine spiders through the object tree, persisting all reachable objects. (This is known as persistence by reachability.) In the reverse direction—that is, calling db.get() to fetch an object—db4o does not pull the entire object tree out of the database. If it did that, then fetching the root of, for example, the binary index, would cause db4o to pull the entire index, plus all the dictionary entries, plus all the definitions into memory at once—not very efficient if we want only one word.
Instead, db4o uses a concept called activation depth. Suppose I've fetched object A into memory from a db4o database using a db.get() call. If I then call db.activate(A,6), that tells db4o also to fetch into memory all objects referenced by A, up to a depth of 6. So, the db.activate() calls that are sprinkled throughout the search routines of the binary tree and the trie classes ensure that the search operation always pulls in enough of the index so that the search can proceed. (And, at the end of a successful search, the dictionary objects are fetched.)
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
- RSS Feeds
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Dynamic DNS—an Object Lesson in Problem Solving
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- Download the Free Red Hat White Paper "Using an Open Source Framework to Catch the Bad Guy"
- Tech Tip: Really Simple HTTP Server with Python




10 min 6 sec ago
5 hours 23 min ago
8 hours 34 min ago
10 hours 50 min ago
11 hours 18 min ago
12 hours 16 min ago
13 hours 45 min ago
14 hours 54 min ago
15 hours 40 min ago
22 hours 16 min ago