Creating and Using a Database with Perl

Perl 5 includes packages enabling your Perl scripts to create and modify databases in standard Unix formats. One of these databases can be a more efficient alternative than a flat text file (which Perl handles marvelously), and it will be compatible with other languages, like C.
Opening a Database

Databases are opened in Perl using the tie() function. This function is responsible for “joining” an associative array with a database package. Operations performed on the associative array are then translated by the database package into function calls that operate on the database file itself.

Here is an example of opening a database named “phone.db” using the DB_File database package:

tie (%phone_db, DB_File, "phone.db") ||
        die ("Cannot open phone.db");

This command binds the associative array named phone_db to the Berkeley DB database file named “phone.db”. In this example, the file must exist and must be readable by the Perl script.

Creating a Database

Creating a database is nearly as simple as opening one. The following command will create a database named “phone.db” in the current directory with the file's permissions set to read-write for the owner and read-only for everyone else. The file will be created only if it does not already exist. If the database file exists in the current directory, the database file will simply be opened for read-write access by the Perl script.

tie (%phone_db, DB_File, "phone.db", O_CREAT|O_RDWR, 0644) ||
        die ("Cannot create or open phone.db");

The O_CREAT and O_RDWR flags are the same flags used as parameters to the Unix open() system call. They specify that the file should be created if it does not exist and opened with read-write access.

Reading from the Database

Reading from the database works exactly like reading data from an associative array. If the key is known, specific records can be read from the file with an expression like:

$record = $phone_db{"Bill Smith"};

All the records in the database file can be scanned (in a seemingly random order) with something like:

while (($name, $record) = each %phone_db) {
        [ commands to process data here ]
}
During each pass through the while loop, the $name scalar variable will be set to the key value from the database, and the $record variable will be set to the data associated with the key.

Writing to the Database

New data can be written into the database by creating a new key in the associative array and setting the key's value. This is done with a command similar to:

$phone_db{"Bill Smith"} = $data;

where $data is the information to be associated with the key “Bill Smith”. Any changes made to the associative array will be written into its corresponding database file.

Deleting Items from the Database

Keys can be removed from the database in exactly the same way items are removed from an associative array in Perl—by using the delete() function. The following code removes the record in the database that refers to “Bill Smith”.

delete $phone_db{"Bill Smith"};
Closing the Database File

Changes to an associative array may not be immediately written out to the database file. To insure that changes are successfully written to the database file, the file must be closed.

Closing the database file involves un-binding the associative array from the database package. This is done with the untie() function in the following manner:

untie(%phone_db);

This closes the database file, making updates to the file if necessary. The associative array %phone_db can now no longer be used to access the records in the database.

Other Types of Databases

All of the examples provided here use the default type of Berkeley DB database, the DB_HASH type. This form of database uses a hash table (like Perl does) to store the keys and their values in the database file. Two other types of databases are provided with the Berkeley DB package: DB_BTREE and DB_RECNO.

The DB_BTREE format uses a sorted, balanced binary tree to store the key and value pairs. This format allows data to be stored and read in a sorted order as opposed to the seemingly random order the DB_HASH format produces. The default comparison routine sorts the keys in the database file in lexical order (alphabetically). The DB_File man page discusses this format in more detail and shows how to replace the default comparison routine with one of your own.

The DB_RECNO format is designed to operate on flat text files. It is bound (with tie()) to normal Perl arrays, not associative arrays. Indexing this array with a number provides the text found on that line of the database file. This format is also discussed in more detail in the DB_File man page.

The desired format of database file is specified with an additional parameter for the tie() function.

tie (%phone_db, DB_File, "phone.db", O_RDONLY, 0644, $DB_BTREE) ||
                die ("Cannot open phone.db");

This command will open the DB_BTREE database named “phone.db” in read-only access mode. If the file does not exist, the command fails.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

hjguyguygudcfgdst

Anonymous's picture

hjguyguygudcfgdst

Nice explanation . It helped

Anonymous's picture

Nice explanation . It helped me a lot in getting started with databases. thanks a lot.

Retrieval of databases

jamesmicheallay's picture

Well, I've learned how to encode information to a database with this article, but I still don't understand database retrieval with DB_File yet. How would you write one program to encode the information and another program to retrieve it (to clear confusion because it seems like you have to encode the database everytime you want to retrieve from it, completely defeating the purpose of saving).

thanks

Pradeep Kota's picture

Nice explanation. It helped me a lot in getting started with databases. thanks a lot. :)

cheers,
Kota.

Previous comment about anonymous hash

Anonymous's picture

It appears this was a typo, should be %phone_db(), since there is no mention of this being a scalar reference of an anonymous hash, but a hash container. I am assuming this is the case, since all other examples do not use a dereference of the hash, they would have been $$phone_db{"key"}

Incorrect syntax in hash formation

Anonymous's picture

The example under the Associative Arrays heading that shows how to store an anonymous hash's reference in a scalar is incorrect; instead of $phone_db=( ... ), it should be $phone_db={ ... } (curly braces, not parens). FWIW, this is a very common misteak 8-}

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState