Pgfs: The PostGres File System
Let's use your favorite Linux distribution as an example of a file tree that needs version control. To start out, copy the virgin operating system from the CD-ROM into Pgfs:
cp -va /cdrom /pgfs/1/1
Let's examine that destination pathname. The pgfs part is where Pgfs is mounted. The first 1 is the “module”. Storing software that evolves independently in different modules saves disk space. The second 1 is the verset. We have a brand new empty Pgfs, so we'll write into verset 1. Once the copy is done, use ls to see what's in the pgfs directory:
ls -l /pgfs/1/1/bin/su /pgfs/1/1/dev/cua0The output from ls looks like this:
-rwsr-xr-x 1 root bin 9853 Aug 14 1995 /pgfs/1/1/bin/su crw-rw---- 1 root uucp 5, 64 Jul 17 1994 /pgfs/1/1/dev/cua0Notice the suid bit on su(1) and leading c on the cua0 mode. Pgfs stores attributes and non-files just like any other file system. This copy of su will make you root if you picked the mount option to accept suid bits when you mounted Pgfs. Next, copy verset 1 to a new verset, so that the new verset can be modified without changing the files in the old one:
echo "cpverset 1" > /pgfs/ctlIn your new verset, you install a newer version of sendmail:
cp /tmp/sendmail /pgfs/1/2/usr/sbin/sendmail chown root.bin /pgfs/1/2/usr/sbin/sendmail chmod 6555 /pgfs/1/2/usr/sbin/sendmailNow that you have two different versets, you can compare their contents. You access multiple versets with shell wild cards or other filename expansions. To find what versets there are, do ls /pgfs/1.
strings - /pgfs/1/1/usr/sbin/sendmail | \
grep version.c
@(#)version.c 8.6.12.1 (Berkeley) 3/28/95
strings - /pgfs/1/{1,2}/usr/sbin/sendmail | \
grep version.c
@(#)version.c 8.6.12.1 (Berkeley) 3/28/95
@(#)version.c 8.8.2.1 (Berkeley) 10/18/96
strings - /pgfs/1/*/usr/sbin/sendmail | \
grep version.c
@(#)version.c 8.6.12.1 (Berkeley) 3/28/95
@(#)version.c 8.8.2.1 (Berkeley) 10/18/96
Most version control packages focus on the individual modification history of single files, and that's what their tools display. I think the idea of the set of files known as “customer release 1.0” is more important than the idea of how each file happened to arrive in that set.
Suppose a new employee comes across a Pgfs containing 200 versets. One of the first things she wants to know is what each verset represents and how they interrelate. Why is this verset here? Where did this verset come from? Which versets represent consistent software releases? Tools with the file as the basic unit would ask her to compare file histories at this point. Too bad there's no way to coherently display 40,000 individual file history trees when she's comparing two versions of /usr. Tools based around the verset scale work better, because there are a lot fewer versets than there are files per verset.
I wanted a program that reads an entire Pgfs database and plots the relationship of each verset to each other, in terms of quantity of shared files and unique files. Run against Pgfs, the program shows that verset 1 and 2 have 19,998 identical files and 2 different ones, and the different ones are /usr/foo and /usr/bar. The program plots boxes for 200 different versions of /usr, with connecting lines that vary in color and width depending on the percentage of shared files and the percentage of older and newer files. If I told the employee in words that two copies of /usr were “almost identical”, “quite a bit different”, or “from two different operating systems”, she would have a good idea of the approximate numbers I meant. In my program I want those pigeonholes to be visually obvious from the pictures that compare versets.
For most system administration purposes I don't care how or why files changed. If I apply a vendor patch to a kernel, all I care about is getting the kernel tree back before and after the patch. I don't want to reverse- engineer the patch script into file adds, deletes, renames and modifies just to shove it into Pgfs. I shouldn't need to notify the version control system which files to view or modify with checkin/checkout commands. I just want an NFS file system, and whatever I have in the directory when I leave is stored in the verset for next time. Since I'm not going to be giving Pgfs hints about what I'm doing, every operation needs to be possible. Therefore, each verset must be totally independent from all the others. I don't want to be forced to evolve my files from previous files in a branch structure without loops, or to keep my filenames constant between versions because of the lack of directory versioning, to name two well-known limitations of CVS.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Linux Systems Administrator
- Using Salt Stack and Vagrant for Drupal Development
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi

It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Featured Jobs
| Linux Systems Administrator | Houston and Austin, Texas | Host Gator |
| Senior Perl Developer | Austin, Texas | Host Gator |
| Technical Support Rep | Houston and Austin, Texas | Host Gator |
| UX Designer | Austin, Texas | Host Gator |
| Web & UI Developer (JavaScript & j Query) | Austin, Texas | Host Gator |
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?




59 min 30 sec ago
1 hour 33 min ago
2 hours 32 min ago
3 hours 22 min ago
7 hours 24 min ago
11 hours 11 min ago
11 hours 19 min ago
13 hours 34 min ago
16 hours 3 min ago
1 day 2 hours ago