Writing Stackable Filesystems

Now you can add a feature to your favorite filesystem without rewriting it.
Getting Started

Now that you understand how stacking works, what next? First, we have to explore the places where we can insert our code into Wrapfs. Referring back to the wrapfs_unlink code shown previously, there are three such places that correspond to before, instead of or after the call to the lower-level ->unlink() method.

1) Pre-call: you can insert code before the call to the lower ->unlink(). For example, you could check if the user is trying to delete an important file and prevent that from happening:

if (strcmp(dentry->d_name.name,
           "vmlinuz") == 0)
return -EACCES;

2) Call: you could replace the entire call itself. For example, instead of removing the file, you could rename it, as part of a simple undo filesystem (we've all had our share of unintended rm -f commands).

3) Post-call: here we could perform actions after the main operation returned from the lower filesystem. For example, suppose a malicious user tries to delete /etc/passwd, but the normal UNIX permission checks prevent it. An administrator might want to log such an action (using syslogd) as follows:

if (err == -EACCES &&
    strcmp(dentry->d_name.name,
           "passwd") == 0)
  printk("uid %d tried to delete passwd",
         current->fsuid);

where current is a global variable that always points to the currently executing task (process), and ->fsuid is the effective UID of that process, for use by filesystems.

These examples and those that follow have been simplified somewhat to save space and to convey their essence. For example, the d_name.name component is not null terminated, so memcmp will have to be used with the proper length; or, to check that the file referred to by the dentry is indeed the real /etc/passwd, the code has to check that the filesystem is the root filesystem or compare against the absolute pathname, using d_path(). For the complete examples, tested under 2.4.20, see the FiST home page (www.cs.sunysb.edu/~ezk/research/fist).

Example: Who Watches the Snoopers

UNIX tries to protect files from access by unauthorized users. When a user tries to open a file to which they do not have access, UNIX promptly returns a permission-denied error. Some users like to snoop around the files of others, sometimes looking for files mistakenly left unprotected or trying to guess names of files that might exist in a searchable-only directory. Unfortunately, even when those snooping users are unsuccessful, the victims of such snooping often are unaware it took place.

One of the most common filesystem operations is ->lookup(), which occurs whenever a system call uses a file's name. The kernel must translate that (string) name to an actual VFS object, such as an inode, dentry or file. To detect snooping users, we place the following code in snoopfs_lookup or snoopfs_permission, right after it calls ->lookup() on the lower filesystem:

if ((err == -EACCES ||
     err == -ENOENT) &&
    dir->i_uid != current->fsuid &&
    current->fsuid != 0)
  printk("snoop uid=%d pid=%d file=%s",
         current->fsuid, current->pid,
         dentry->d_name.name);

Here, we check the return code (err) from the call to the lower ->lookup(). If the status is EACCES (permission denied) or ENOENT (no such file or directory) and if the directory's owner (dir->i_uid) is different from that of the user running the current task (current->fsuid) and the current user is not the superuser (because root users can do anything), then it prints a descriptive message identifying the snooping user. This message typically is logged by syslogd.

Example: Encryption Fit for a Caesar

Wrapper technologies are particularly suitable for security-related applications where wrapping, or monitoring, is often useful. Not surprisingly, the most popular applications developed from FiST are cryptographic filesystems. In this example, we demonstrate a simple encryption filesystem that uses the rot13 cipher.

In this filesystem, we want to encrypt all file data using a function (presumably already written) called rot13 that takes an input buffer, output buffer and their lengths. However, unlike the previous examples, there is no single method where you can place the rot13() function to encrypt the file's data. In fact, manipulating file data in any filesystem is rather complex because it involves multiple methods, as well as two forms of accessing file's data, the read and write system calls, which can work at any file offset, and mmap, which works on whole pages. To make life easier for stackable filesystem developers, Wrapfs consolidates all of these methods into two simple calls: one to encode file data and one to decode file data, both working on whole page-aligned data pages (for example, 4KB on IA-32 systems). Using the Wrapfs template, the only code you have to write to produce a rot13-based encryption filesystem looks like the following:

int
encode_block(void *in, void *out, int len)
{
  rot13(in, out, len);
  return len;
}
int
decode_block(void *in, void *out, int len)
{
  rot13(in, out, len);
  return len;
}

Wrapfs already contains all the complex code that handles mixed reads, writes and memory-mapped operations. Wrapfs makes calls to encode_block to encrypt a data page and to decode_block to decrypt a data page (they are identical in this example).

Of course, rot13 is hardly a practical cipher, but given this simple example, you can build much stronger cryptographic filesystems. Following this, we recently have built a powerful cryptographic filesystem called NCryptfs (a successor to Cryptfs). NCryptfs supports multiple ciphers; multiple keys per user, process or group; multiple authentication schemes; key timeouts and revocation; delegated privileges; and more—all with a negligible performance overhead.

Wrapfs also supports manipulating filenames using two additional routines to encode and decode filenames. One thing to watch out for when encrypting filenames is that filenames must remain valid after encryption. In other words, they cannot contain nulls or “/” characters. A common solution is to uuencode the file's name after encryption.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Nicely demonstrated

Uday Chitragar's picture

Nicely demonstrated stackable file systems.
However, in real applications it is hard to keep the two layers (crypt fs and underlying low level file system) separate.

Re: Kernel Korner: Writing Stackable Filesystems

Anonymous's picture

Really nice article, although would be thrilled to see a more followup of the same!

amazing article...

pradeep's picture

I must congratulate you for an aticle that is simple and addresses the core of the issues relating to stacking..

keep posting new articles ..

Pradeep

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix