Pgfs: The PostGres File System
Here's a description of the real Pgfs program that you can download. Pgfs is a normal user-level program that reads and writes ordinary TCP streams and UDP packets. Since it is a normal program that requires no privileges, it can run on any Linux system. It doesn't use any ground breaking system call features, so no kernel modifications are necessary. The TCP stream packets are generated by the PostGres client library, so Pgfs can interact with a PostGres database using SQL. The UDP packets are formatted by the conventions of the NFS protocol. All this means is that an NFS client such as a Linux kernel can choose to send NFS packets Pgfs' way, and can mount a file system as if Pgfs were any other variety of NFS server. The AMD automounter is another example of a user-level program that acts as an NFS server. AMD responds to the directory-browsing NFS operations that trigger an automounter response, whereas Pgfs responds to all NFS operations.
In essence, Pgfs is an NFS <-> SQL translator. When an NFS request comes in, the C code submits SQL to get the stat(2) structures for the directory and file mentioned in the request, doing error and permission checking as it goes along. First it compares the request with the data it gets back about the file, enforcing conditions, such as whether rmdir can or can't be used to delete a file.
If the request is valid and the permissions allow it, the C code finds all the stat(2) structures that must be changed, such as the current file, the current directory, the directory above and hard links that share the file's inode. Then these modifications are made in the database by SQL. The modifications include side effects like updating the access time that you might not ordinarily think of.
Each NFS operation is processed within a database transaction. If an “expected” error occurs that could be caused by bad user input on the NFS client, such as typing rmdir to delete a file, an NFS error is returned. If an “unexpected” error occurs, such as the database not responding or a file handle not found, the transaction is aborted in a way that will not pollute the file system with bad data.
Pgfs does all the things “by hand” that go on in a “real” file system. It uses PostGres as a storage device that it accesses by inode number, pathname and verset number. For an example, the nfs_getattr NFS operation works like the lstat(2) system call. getattr takes a file identifier, in this case an NFS handle instead of a pathname, and returns all the fields of a stat(2) structure. When Pgfs processes an nfs_getattr operation, the following things happen:
The NFS packet is broken apart into operation and arguments.
NFS operations counters are incremented.
The NFS handle is broken into fields.
Bounds-checking is done on the nfs_getattr parameters.
stat(2) information is gotten for handle, e.g., select * from tree where handle = 20934
Permissions are checked.
File access times are updated, e.g., update tree set atime = 843357663 where inode = 8923
NFS reply is constructed.
Reply is sent to NFS client
The single table that holds all the stat(2) structures has fields defined as shown in Table 1.
Inode numbers are unique across the entire database, even for identical files in different versets. Each file in each verset has one database row. Each directory has three rows; one for it's name from the directory above, one for . (dot), and one for .. (dot dot) from the directory below.
Philosophically, compression of similar file trees is the business of the back end of a program—it should not be visible to the user. In Pgfs, each collection of file bytes is contained in a Unix file, shared copy-on- write across all the versets from which the filename was inherited. Whenever a shared file is modified, a private copy is made for that verset. This matches Pgfs' system administration orientation, where files will be large and binary and replaced in total, and the old and new binaries won't be similar enough to make differences small. This differs from source code, where the same files get incrementally modified over and over and differences are small. With the keep-whole-files policy, doing a grep on files in multiple versets won't be slower than staying within a single verset. There is not a big delay while a compression algorithm unpacks intermediate versions into a temporary area.
Getting Started with DevOps - Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report
August 27, 2015
12:00 PM CDT
DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high-anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks.
Free to Linux Journal readers.Register Now!
|Secure Server Deployments in Hostile Territory, Part II||Jul 29, 2015|
|Hacking a Safe with Bash||Jul 28, 2015|
|KDE Reveals Plasma Mobile||Jul 28, 2015|
|Huge Package Overhaul for Debian and Ubuntu||Jul 23, 2015|
|diff -u: What's New in Kernel Development||Jul 22, 2015|
|Shashlik - a Tasty New Android Simulator||Jul 21, 2015|
- Hacking a Safe with Bash
- Secure Server Deployments in Hostile Territory, Part II
- Huge Package Overhaul for Debian and Ubuntu
- KDE Reveals Plasma Mobile
- The Controversy Behind Canonical's Intellectual Property Policy
- Shashlik - a Tasty New Android Simulator
- Home Automation with Raspberry Pi
- Embed Linux in Monitoring and Control Systems
- diff -u: What's New in Kernel Development
- General Relativity in Python