Pgfs: The PostGres File System
The PostGres file system is a Linux NFS server that presents software versions as distinct file trees in an NFS file system. Each version is completely distinct from all other versions, and can be modified independently without regard to versions before or after it. Each version retains all of the properties it would have on a normal file system, such as file ownership, permissions, binary file contents, cross-directory hard links and non-files such as devices and symlinks. The effect is the same as if each version of the software had its own separate directory, except far less disk space is used.
As an example, let's say that a year ago, you picked your favorite Linux distribution, and installed it on your new computer. The distribution had about 15,000 files, and took up about 200MB on disk. During the year you did a lot of hacking on your software, and it is now quite different from the original distribution. These modifications were done incrementally, and some of them replaced the original binaries with completely new binaries, for instance upgrading sendmail(8) or ftpd(8). Now you wish to compare your machine to the original distribution, examine the changes you've made, and apply them to a new distribution of Linux.
How do you record the changes you've made? You could save a complete copy of your distribution every time after you modified it—doing that would consume 200MB * 100 mods = 20 GB of disk. Even using a pair of 9 GB drives that's awfully expensive. However, you notice that most modifications change only a few files—perhaps a total of a half MB per modification. Storing only the files that changed would use 200MB + (100 mods * 0.5MB/mod) = 250MB of disk space—that's much better. What application would store only the differences for you? You could use CVS, but CVS isn't really suitable.
Now let's say you are a systems administrator, and you've faced this version control problem daily for years without finding a satisfactory solution. So since you're also a developer, you decide to build an application to store similar file trees that exploits the compression opportunities you've found. Fundamentally, this application would need to eat file trees and spit them back out again, and it must use less disk than keeping a whole copy of each tree. It should accept files one at a time or in bulk. You shouldn't have to extract and resubmit a whole file tree just to make one change.
How would you implement this application? Start by deciding what data structures are needed and what routines are needed to manipulate the structures. Let's start with files. Files consist of two parts—a stat(2) structure and a big chunk of binary data. Suppose you store the binary data in a file and name this file with a number. Then, name the stat structures with a number and store the fixed-length structures in an array on disk. The structures can be broken apart using field-splitting routines and assembled using record-making routines. Next, a structure is needed to represent the different versions of your software. Each version of your operating system consists of a tree of files. You name a tree of files that represent one specific software version a “version set” or “verset” and number them.
The next thing needed is some routines to search the stat array on disk for a specific structure, add structures and delete them. Since you will be doing random access to the structures, store them in a dbm (database management) file and use the dbm access routines instead of writing your own. Dbm also gives you routines to handle an index into the stat structure numbers, in order to make your access faster. You will need to write maintenance routines to copy dbm files and to copy fields from one dbm file to another with a different structure layout.
To add a new stat structure into your array may require modifying fields in structures other than the one you're adding; for instance, when you add one file to an existing file tree. Your programming task would be a lot simpler if you could collect a bunch of these modifications and do them all at once, or not do any of them if you discover a problem. The idea of doing all or none of a complex modification is called “transactions”.
To use your application you need commands for it to accept. Some commands might be “add this whole tree of files”, “add this single file” and “replace some bytes of a file with these bytes”. The NFS people have figured out the minimal set of file operations you need. (See Sidebar 1.) Now decide how stat(2) structures will be modified for each of the file operations and write pseudo code to modify the stat(2) array. While designing the semantics of the NFS commands, start thinking about sending your application NFS commands, making it an NFS server.
Next write your application. How about using an SQL database? A database decouples the application data structures from their representation on disk giving you the following advantages:
Structures can be defined with arbitrary fields and stored in tables.
A full set of routines to add, delete and modify structures is available, as well as indices to find structures quickly.
A nice command language is available to translate between structure formats as the application evolves.
Routines in the database are designed to operate on chunks of data that won't fit in memory all at once, so your application can grow without problems.
To add a field called “cokecans” to tally the number of cans of Coke it took to create each file, just add it. You can transfer your existing data to a new table with the cokecans field in a couple lines of SQL. Compare this to coding in C where a bunch of custom binary-format conversion programs would have to be written.
Then, find the skeleton of a user-level NFS server and port it to Linux (see Sidebar 2), and hook up the source of NFS commands to the command input of your application. Now you have an NFS server that presents file trees, but compresses away the similarities between trees. Since your application can be used like any file system, you don't have to build any specialized programs to manipulate versions—you can search files with grep and compare file trees with diff.
To control your application, create some fake magic filenames that it can treat as special, like procfs. The lines that are written to these files are the commands to your application. Now your application can be controlled with echo(1) commands from the shell rather than some obscure socket protocol.
The above description is not exactly how I went about writing Pgfs, but it does outline the design motivation. After I tried to store copies of a BSDI distribution under CVS and failed in practical terms, I set out to write an NFS server implemented on a database. My first version was coded in Perl5 using the PostGres client library, and I typed in NFS commands as space-separated text strings. I recoded in C to pick up the NFS RPCs. My first database design schema used one table for “names”--holding filenames and symlinks, and another table for “inodes”--holding the rest of the stat(2) structure and the pointer to the file contents. However, I didn't like the join operation (i.e., matching up rows from two tables with the same key), and I didn't want to implement join either in the database or the application code.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




2 hours 38 min ago
2 hours 55 min ago
4 hours 48 min ago
6 hours 41 min ago
13 hours 35 min ago
13 hours 52 min ago
15 hours 43 min ago
21 hours 35 min ago
1 day 2 hours ago
1 day 2 hours ago