Creating and Using a Database with Perl
Databases are opened in Perl using the tie() function. This function is responsible for “joining” an associative array with a database package. Operations performed on the associative array are then translated by the database package into function calls that operate on the database file itself.
Here is an example of opening a database named “phone.db” using the DB_File database package:
tie (%phone_db, DB_File, "phone.db") ||
die ("Cannot open phone.db");
This command binds the associative array named phone_db to the Berkeley DB database file named “phone.db”. In this example, the file must exist and must be readable by the Perl script.
Creating a database is nearly as simple as opening one. The following command will create a database named “phone.db” in the current directory with the file's permissions set to read-write for the owner and read-only for everyone else. The file will be created only if it does not already exist. If the database file exists in the current directory, the database file will simply be opened for read-write access by the Perl script.
tie (%phone_db, DB_File, "phone.db", O_CREAT|O_RDWR, 0644) ||
die ("Cannot create or open phone.db");
The O_CREAT and O_RDWR flags are the same flags used as parameters to the Unix open() system call. They specify that the file should be created if it does not exist and opened with read-write access.
Reading from the database works exactly like reading data from an associative array. If the key is known, specific records can be read from the file with an expression like:
$record = $phone_db{"Bill Smith"};
All the records in the database file can be scanned (in a seemingly random order) with something like:
while (($name, $record) = each %phone_db) {
[ commands to process data here ]
}
During each pass through the while loop, the
$name scalar variable will be set to the key
value from the database, and the $record
variable will be set to the data associated with the key.
New data can be written into the database by creating a new key in the associative array and setting the key's value. This is done with a command similar to:
$phone_db{"Bill Smith"} = $data;
where $data is the information to be associated with the key “Bill Smith”. Any changes made to the associative array will be written into its corresponding database file.
Keys can be removed from the database in exactly the same way items are removed from an associative array in Perl—by using the delete() function. The following code removes the record in the database that refers to “Bill Smith”.
delete $phone_db{"Bill Smith"};
Changes to an associative array may not be immediately written out to the database file. To insure that changes are successfully written to the database file, the file must be closed.
Closing the database file involves un-binding the associative array from the database package. This is done with the untie() function in the following manner:
untie(%phone_db);
This closes the database file, making updates to the file if necessary. The associative array %phone_db can now no longer be used to access the records in the database.
All of the examples provided here use the default type of Berkeley DB database, the DB_HASH type. This form of database uses a hash table (like Perl does) to store the keys and their values in the database file. Two other types of databases are provided with the Berkeley DB package: DB_BTREE and DB_RECNO.
The DB_BTREE format uses a sorted, balanced binary tree to store the key and value pairs. This format allows data to be stored and read in a sorted order as opposed to the seemingly random order the DB_HASH format produces. The default comparison routine sorts the keys in the database file in lexical order (alphabetically). The DB_File man page discusses this format in more detail and shows how to replace the default comparison routine with one of your own.
The DB_RECNO format is designed to operate on flat text files. It is bound (with tie()) to normal Perl arrays, not associative arrays. Indexing this array with a number provides the text found on that line of the database file. This format is also discussed in more detail in the DB_File man page.
The desired format of database file is specified with an additional parameter for the tie() function.
tie (%phone_db, DB_File, "phone.db", O_RDONLY, 0644, $DB_BTREE) ||
die ("Cannot open phone.db");
This command will open the DB_BTREE database named “phone.db” in read-only access mode. If the file does not exist, the command fails.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Linux Systems Administrator
- Introduction to MapReduce with Hadoop on Linux
- RSS Feeds
- Tech Tip: Really Simple HTTP Server with Python
- Weechat, Irssi's Little Brother
- Help with Designing or Debugging CORBA Applications
- Senior Perl Developer
- Reply to comment | Linux Journal
11 min 33 sec ago - Welcome to 1998
1 hour 3 sec ago - notifier shortcomings
1 hour 23 min ago - heroku?
3 hours 36 sec ago - Android User
3 hours 2 min ago - Reply to comment | Linux Journal
4 hours 55 min ago - compiling
7 hours 44 min ago - This is a good post. This
12 hours 57 min ago - Great, This is really amazing
12 hours 59 min ago - These posts are really good
13 hours 1 min ago
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




Comments
hjguyguygudcfgdst
hjguyguygudcfgdst
Nice explanation . It helped
Nice explanation . It helped me a lot in getting started with databases. thanks a lot.
Retrieval of databases
Well, I've learned how to encode information to a database with this article, but I still don't understand database retrieval with DB_File yet. How would you write one program to encode the information and another program to retrieve it (to clear confusion because it seems like you have to encode the database everytime you want to retrieve from it, completely defeating the purpose of saving).
thanks
Nice explanation. It helped me a lot in getting started with databases. thanks a lot. :)
cheers,
Kota.
Previous comment about anonymous hash
It appears this was a typo, should be %phone_db(), since there is no mention of this being a scalar reference of an anonymous hash, but a hash container. I am assuming this is the case, since all other examples do not use a dereference of the hash, they would have been $$phone_db{"key"}
Incorrect syntax in hash formation
The example under the Associative Arrays heading that shows how to store an anonymous hash's reference in a scalar is incorrect; instead of
$phone_db=( ... ), it should be$phone_db={ ... }(curly braces, not parens). FWIW, this is a very common misteak 8-}