Writing Stackable Filesystems

Now you can add a feature to your favorite filesystem without rewriting it.
Getting Started

Now that you understand how stacking works, what next? First, we have to explore the places where we can insert our code into Wrapfs. Referring back to the wrapfs_unlink code shown previously, there are three such places that correspond to before, instead of or after the call to the lower-level ->unlink() method.

1) Pre-call: you can insert code before the call to the lower ->unlink(). For example, you could check if the user is trying to delete an important file and prevent that from happening:

if (strcmp(dentry->d_name.name,
           "vmlinuz") == 0)
return -EACCES;

2) Call: you could replace the entire call itself. For example, instead of removing the file, you could rename it, as part of a simple undo filesystem (we've all had our share of unintended rm -f commands).

3) Post-call: here we could perform actions after the main operation returned from the lower filesystem. For example, suppose a malicious user tries to delete /etc/passwd, but the normal UNIX permission checks prevent it. An administrator might want to log such an action (using syslogd) as follows:

if (err == -EACCES &&
    strcmp(dentry->d_name.name,
           "passwd") == 0)
  printk("uid %d tried to delete passwd",
         current->fsuid);

where current is a global variable that always points to the currently executing task (process), and ->fsuid is the effective UID of that process, for use by filesystems.

These examples and those that follow have been simplified somewhat to save space and to convey their essence. For example, the d_name.name component is not null terminated, so memcmp will have to be used with the proper length; or, to check that the file referred to by the dentry is indeed the real /etc/passwd, the code has to check that the filesystem is the root filesystem or compare against the absolute pathname, using d_path(). For the complete examples, tested under 2.4.20, see the FiST home page (www.cs.sunysb.edu/~ezk/research/fist).

Example: Who Watches the Snoopers

UNIX tries to protect files from access by unauthorized users. When a user tries to open a file to which they do not have access, UNIX promptly returns a permission-denied error. Some users like to snoop around the files of others, sometimes looking for files mistakenly left unprotected or trying to guess names of files that might exist in a searchable-only directory. Unfortunately, even when those snooping users are unsuccessful, the victims of such snooping often are unaware it took place.

One of the most common filesystem operations is ->lookup(), which occurs whenever a system call uses a file's name. The kernel must translate that (string) name to an actual VFS object, such as an inode, dentry or file. To detect snooping users, we place the following code in snoopfs_lookup or snoopfs_permission, right after it calls ->lookup() on the lower filesystem:

if ((err == -EACCES ||
     err == -ENOENT) &&
    dir->i_uid != current->fsuid &&
    current->fsuid != 0)
  printk("snoop uid=%d pid=%d file=%s",
         current->fsuid, current->pid,
         dentry->d_name.name);

Here, we check the return code (err) from the call to the lower ->lookup(). If the status is EACCES (permission denied) or ENOENT (no such file or directory) and if the directory's owner (dir->i_uid) is different from that of the user running the current task (current->fsuid) and the current user is not the superuser (because root users can do anything), then it prints a descriptive message identifying the snooping user. This message typically is logged by syslogd.

Example: Encryption Fit for a Caesar

Wrapper technologies are particularly suitable for security-related applications where wrapping, or monitoring, is often useful. Not surprisingly, the most popular applications developed from FiST are cryptographic filesystems. In this example, we demonstrate a simple encryption filesystem that uses the rot13 cipher.

In this filesystem, we want to encrypt all file data using a function (presumably already written) called rot13 that takes an input buffer, output buffer and their lengths. However, unlike the previous examples, there is no single method where you can place the rot13() function to encrypt the file's data. In fact, manipulating file data in any filesystem is rather complex because it involves multiple methods, as well as two forms of accessing file's data, the read and write system calls, which can work at any file offset, and mmap, which works on whole pages. To make life easier for stackable filesystem developers, Wrapfs consolidates all of these methods into two simple calls: one to encode file data and one to decode file data, both working on whole page-aligned data pages (for example, 4KB on IA-32 systems). Using the Wrapfs template, the only code you have to write to produce a rot13-based encryption filesystem looks like the following:

int
encode_block(void *in, void *out, int len)
{
  rot13(in, out, len);
  return len;
}
int
decode_block(void *in, void *out, int len)
{
  rot13(in, out, len);
  return len;
}

Wrapfs already contains all the complex code that handles mixed reads, writes and memory-mapped operations. Wrapfs makes calls to encode_block to encrypt a data page and to decode_block to decrypt a data page (they are identical in this example).

Of course, rot13 is hardly a practical cipher, but given this simple example, you can build much stronger cryptographic filesystems. Following this, we recently have built a powerful cryptographic filesystem called NCryptfs (a successor to Cryptfs). NCryptfs supports multiple ciphers; multiple keys per user, process or group; multiple authentication schemes; key timeouts and revocation; delegated privileges; and more—all with a negligible performance overhead.

Wrapfs also supports manipulating filenames using two additional routines to encode and decode filenames. One thing to watch out for when encrypting filenames is that filenames must remain valid after encryption. In other words, they cannot contain nulls or “/” characters. A common solution is to uuencode the file's name after encryption.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Nicely demonstrated

Uday Chitragar's picture

Nicely demonstrated stackable file systems.
However, in real applications it is hard to keep the two layers (crypt fs and underlying low level file system) separate.

Re: Kernel Korner: Writing Stackable Filesystems

Anonymous's picture

Really nice article, although would be thrilled to see a more followup of the same!

amazing article...

pradeep's picture

I must congratulate you for an aticle that is simple and addresses the core of the issues relating to stacking..

keep posting new articles ..

Pradeep

Webcast
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers

Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.

Learn More

Sponsored by AMD

White Paper
Red Hat White Paper: Using an Open Source Framework to Catch the Bad Guy

Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6

Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.

Learn more about catching the bad guy in this free white paper.

Learn More

Sponsored by DLT Solutions