The World Is a libferris Filesystem

Linux Journal

by Ben Martin

on April 28, 2006

The libferris virtual filesystem always has sought to push the boundaries of what a filesystem should do in terms of what can be mounted and what metadata is available for files. During the past five years, it has expanded its capabilities from mounting more traditional things, such as tar.gz, SSH, digital cameras and IPC primitives, to being able to mount various Indexed Sequential Access Mechanism (ISAM) files, including db4, tdb, edb, eet and gdbm; various relational databases, including odbc, MySQL and PostgreSQL; various servers, such as HTTP, FTP, LDAP, Evolution and RDF graphs; as well as XML files and Sleepycat's dbXML.

Recently, support for indexing filesystem data using any combination of Lucene, ODBC, TSearch2, xapian, LDAP, PostgreSQL and Web search has been added with the ability to query these back ends for matching files. Matches naturally are presented as a virtual filesystem. Details of using the index and search capabilities of libferris appeared in the February 2005 issue of Linux Journal in my article “Filesystem Indexing with libferris”. I should mention that anything you see mounted as a filesystem in this article can be indexed and searched for as described in that past article on searching.

You can access your libferris virtual filesystem either by native libferris clients or by exporting libferris through Samba.

The two primary abstractions in libferris are the Context and the Extended Attribute (EA). A Context can be thought of as a superclass of a file or directory. In libferris, there is less of a distinction between a file and a directory with the ability for a file to behave like a directory if it is treated like one. For example, if you try to read a tar.gz file as a directory, libferris automatically mounts the archive as a filesystem and lists the contents of the archive as a virtual filesystem.

The EA interface can be thought of as a similar concept to the Linux kernel's EA interface. That is, arbitrary key-value data is attached to files and directories. This EA concept was extended early on in libferris to allow the value for an attribute to be derived from the content of a file. This means simple things like width and height of an image or video file become first-class metadata citizens along with a file's size and modification time. The limits on what metadata is available extend far beyond image metadata to include XMP, EXIF, music ID tags, Annodex media, geospatial tags, RPM metadata, SELinux integration, partially ordered emblem categories and arbitrary personal RDF stores of metadata.

Having all metadata available through a single interface allows libferris to provide filtering and sorting capabilities on any of that metadata. As such, you can sort a directory by any metadata just as easily as you would use ls -Sh to sort by file size. Sorting on multiple metadata values is also supported in libferris; you can sort your files easily by MIME type, then image width, then modification time—with all three pieces of metadata contributing to the final directory ordering. Any libferris virtual filesystem can have filtering and sorting applied to it to obtain a new libferris virtual filesystem.

You can store EA values into a personal RDF store—for example, when you write an image width to an extended attribute. When you subsequently read the image width, you get the value you just wrote to the EA. This extends naturally to other situations, such as when you change the x or y EA for a window, which should move the window.

Allowing EA to be stored in a personal RDF file lets you add metadata to any libferris object, even those for which you have only read access. For example, you can attach emblems or comments to the Linux Kongress Web site just as you would a normal file.

An interesting EA for all files is the content EA, which is equivalent to the file's byte contents. Exposing the file itself through the EA interface means that any information about a file can be obtained via the same interface.

libferris is written in C++ and provides a standard IOStream interface to both Contexts and EA. Many standard file utilities have been rewritten to take advantage of libferris features. These clients include ls, cp, mv, rm, mkdir, cat, find, touch, IO redirection and more.

Filesystem Interaction

As we explore these filesystems, I use the ferrisls command, which mimics the coreutils ls(1) command. As well as the -l long listing option, I use the -0 (zero) recommended-ea option of ferrisls. This operates in much the same way as -l, though it asks the filesystem itself which EAs are most interesting for the user to see. I assume a shell alias of fls=ferrisls in the code examples.

I start by showing interaction with the standard kernel-based filesystems and some of the EA possibilities. Along with the recommended-ea option, ferrisls supports the --xml option to produce an XML document as output. This provides information as to what EA each value belongs and provides one possibility to drive Web interfaces using libferris.

Listing 1. A Long Listing of a Directory with Explicit Metadata

$ fls -l \
--show-ea=size-human-readable,width,height,name
4.5k    48      46      emacs.png
1.9k    48      48      gnome-warning.png
3.2k    48      48      gnome-xterm.png
2.5k    48      48      gtkvim.png

Listing 2. Asking libferris itself to determine which EAs are of interest for the current directory and producing an XML document as output.


$ fls -0 --xml
<ferrisls>
<ferrisls url="file:///tmp/lj"  name="lj"  >
 <context  size-human-readable="4.5k"
  protection-ls="-rw-r-----"
  mtime-display="05 Dec  4 23:39"
  name="emacs.png"  width="48"  height="46"  />
 ...
</ferrisls>
</ferrisls>

As mentioned previously, if you are sorting a directory on an EA that does not provide a complete ordering, you can chain together sorting predicates. For example, in Listing 3, I have sorted the output based on the numeric EA height and then used a version string sort on the name EA. A version sort is similar to the ls(1) -v option, which in Listing 3 has placed foo20.png after foo3.png. Such sorting is very useful when sorting by file type or MIME major type followed by name.

Listing 3. Sorting Your Output

$ fls --show-ea=width,height,size,name \
  --ferris-sort='(:#:height)(:V:name)'
48      48      1968    gnome-warning.png
48      48      3253    gnome-xterm.png
48      48      2550    gtkvim.png
48      46      4589    emacs.png
48      46      4589    foo3.png
48      46      4589    foo20.png

The two concepts of files forming a tree and files having key-value pairs attached to them are similar to the structure of XML. With libferris, you can poke inside XML documents as though they were just another filesystem. For example, see Listing 4.

Listing 4. Initial Exploration of XML as a Filesystem


$ cat example.xml
<root>
  <file1 size="200" />
  <file2 interesting="yes" />
  <file3>filesystems rock
</file3>
</root>

$ fls -0 ./example.xml/root
file1
file2
file3

$ fls -d --show-ea=name,interesting \
   ./example.xml/root/file2
file2   yes

$ fcat example.xml/root/file3
filesystems rock

By interacting with your filesystem, you can cause updates on the underlying XML document as well. The ferris-redirect client exists to allow shell-like redirection into libferris files. The -T or --trunc option truncates an existing file before writing stdin into it. This is much like the >| shell option. As you can see from the interaction in Listing 5, we have changed the structure of the example.xml document significantly through filesystem interaction.

Listing 5. Changing an XML File through Its Filesystem


$ echo "VIRTUAL filesystems rock more" | \
  ferris-redirect -T ./example.xml/root/file3

$ echo "a new way" | \
  ferris-redirect ./example.xml/root/file4

$ ferrisrm ./example.xml/root/file2

$ ftouch ./example.xml/root/touched

$ cat example.xml
<?xml version="1.0" encoding="UTF-8"
   standalone="no" ?>
<root>
  <file1 size="200"/>

  <file3>VIRTUAL filesystems rock more
</file3>
  <file4>a new way
</file4>

  <touched/>

</root>

As many modern word-processing documents are XML inside a compressed container, libferris allows you to drill down into the office document as though it were a filesystem. In Listing 6, I am listing a simple OpenOffice.org Writer document as a filesystem.

Listing 6. OpenOffice.org Documents Are Filesystems Too

$ fls -lh show-ea=size,name,content \
~/sample-oo-writer.odt/content.xml/ \
office:document-content/office:body/office:text
 0       office:forms
 18      text:p Paragraph number 1
 0       text:p-1
 116     text:p-2 This is the second paragraph ...
 0       text:p-3
 39      text:p-4 And in summary, this is really...
 0       text:p=5
 0       text:sequence-decls

A Xerces-C Document Object Model (DOM) can be obtained for any libferris filesystem, just as a Xerces-C DOM can be mounted as a libferris filesystem. Creation of a DOM for a filesystem is evaluated lazily, so you can get a DOM for file:// and only the parts of the DOM that are required are ever created.

The ability to convert any libferris filesystem into a DOM allows you to apply XSLT to your filesystems easily. The example C++ code in Listing 7 applies a stylesheet to a mounted OpenOffice.org document.

Listing 7. C++ Code Fragment Applying an XSLT to a Filesystem

fh_context c = Resolve( "~/example.odt/content.xml/"
 "office:document-content/office:body/office:text");
DOMDocument* theDOM = Factory::makeDOM( c );
...
// should use XercesDOMWrapperParsedSource
XalanTransformer theXalanTransformer;
theXalanTransformer.transform(
    theDOM, "~/my-oo.xsl", cout );

Recently, support for mounting applications such as Firefox, Evolution and the X Window System was added to libferris.

The evolution:// filesystem allows you to mount your Evolution mail client. Support currently extends to your mail folders and contacts. Using this filesystem, it is no longer necessary to save attachments to temporary files to access them from ferris-aware systems.

Mounting your X Window System is done via the xwin:// filesystem. This gives access to your window objects as well as lets you mount Klipper on KDE desktops. For Klipper, you can ls, cat and cp your past clipboard interactions easily, and overwriting the top clipboard element is effectively a clipboard copy. The window mounting lets you see where your windows are in terms of x,y offsets as well as other interesting data. Listing 8 shows a sample session of mounting my Evolution mail client and the X Window System.

Listing 8. Mounting Evolution and the X Window System


$ fls evolution://localhost
contacts  mail
$ fls -0 evolution://localhost/contacts/system/
...
witme-ferris   witme-ferris@lists.sourceforge.net
...
$ fls -0 xwin://localhost/clipboard
0       #include <Ferris/Ferris.hh>
1       Let the cricket stick to its hearth
2       ...

Mounting databases allows you to explore the database server, its databases and their tables and views. Shown in Listing 9, I create a database, populate it and interact with it as a virtual filesystem. The final command using the --xml option for ferrisls exports each tuple in XML format.

Instead of embedding the user name and password in the URL, libferris elects to store this information in configuration files. This is a trade-off where the risk of accidentally copying and pasting a URL with embedded user credentials is minimized at the expense of having a central store of available credentials and mappings for where to use each credential. For many common URLs, inline authentication information is also supported.

Listing 9. PostgreSQL as a Filesystem


$ psql
# create database tmp;
# \c tmp
# create table foo ( message varchar(100) not null,
                    id int primary key );
# insert into foo values ( 'doki doki', 1 );
# \q

$ fls -0 pg://localhost/tmp/foo
doki doki 1       1       id

$ fcat pg://localhost/tmp/foo/1
<context  id="1"  message="doki doki"  />

$ echo "waku waku" | ferris-redirect \
  -T --ea=message pg://localhost/tmp/foo/1

$ fls -0 pg://localhost/tmp/foo
waku waku  1       1        id

$ gfcreate pg://localhost/tmp/foo
# See the gfcreate-tuple figure

$ fls -0 pg://localhost/tmp/foo
utsukushii  2       2        id

$ psql tmp;
# select * from foo;
        message         | id
------------------------+----
waku waku               |  1
utsukushii              |  2

The invocation of gfcreate shown in Listing 9 is captured in Figure 1.

Figure 1. Creating a New Tuple in PostgreSQL through the Filesystem

A libferris filesystem can nominate which objects it is happy to have created on it. You can see this list by using the fcreate or gfcreate tools in the ferriscreate package. A large list of possibilities will be displayed for an fcreate -l /tmp, for example. For a PostgreSQL database, you can create only a small number of new object types, as shown in Listing 10. I'll use fcreate in a moment to create a new empty db4 file to show how filesystem monitoring is virtualized in libferris.

Listing 10. What types of things can I create for a PostgreSQL filesystem?

$ fcreate -l pg://localhost/tmp
listing types that can be created
    for context: pg:///localhost/tmp
queryview
table

$ fcreate -l pg://localhost/tmp/foo
listing types that can be created
    for context: pg:///localhost/tmp/foo
tuple

Many changes made to a libferris filesystem are reflected instantly in other libferris applications. Many kernel-level interfaces let applications know when a kernel filesystem changes—for example, inotify and dnotify. libferris extends this to allow clients to know when a virtual filesystem has changed. For example, when you update an element inside of an XML file, inotify tells you only that the XML file has changed. With libferris, you can see exactly which part of the XML file was modified by other libferris applications.

Listing 11 demonstrates the filesystem monitoring support. In the example, I use the --monitor-all option of ferrisls. This makes ferrisls operate like a tail -f for your given URL; any creation, deletion or interesting filesystem activity is shown on the console. In another terminal, Listing 12, I'm creating, deleting and writing to “files” inside a Berkeley db4 file. ferrisls happily reports what is happening to these virtual files.

Listing 11. Output of One Virtual Console

$ fcreate --create-type=db4 --rdn=raw.db .
$ fls --monitor-all -0 ./raw.db
Created new1
Changed c:0x8321f88     /tmp/ljdb/raw.db
Changed c:0x8321f88     /tmp/ljdb/raw.db
Deleted new1
Created redirected-output
Changed c:0x8321f88     /tmp/ljdb/raw.db

Listing 12. Output of Another Virtual Console

$ ftouch ./raw.db/new1

$ ferrisrm -v ./raw.db/new1
removing ./raw.db/new1

$ echo "hello" | \
   ferris-redirect -T ./raw.db/redirected-output

$ fcat ./raw.db/redirected-output
hello

Many operations performed with libferris are also stored for possible future use. This includes the types of files you recently created (png, jpeg, db, tuple and so on), which files you recently edited and viewed and more. All of this is kept only for your personal use and never sent anywhere. Storage of metadata on files you view and edit is called remembrance in libferris. Only view and edit actions invoked through libferris are currently remembered. Listing 13 shows how I set up Xine to be executed as the default view operation on Annodex media files.

Listing 13. Setting Up Xine to Play Annodex Files

$ cat xine.desktop
[Desktop Entry]
Name=xine
Comment=Video Player
Exec=xine
MimeType=video/mpeg;...
Icon=~/icons/xine.png
Terminal=0
Type=Application
$ ferris-import-desktop-file xine.desktop
$ ferris-set-file-action-for-type -v -a xine \
     /tmp/Wombats.anx

# Lets view the video.
$ alias fv="ferris-file-action -v"
$ fv /tmp/Wombats.anx

Now we can explore what libferris knows about our past operations. By default, remembered operations are grouped by operation type then media type. The recommended EA for the final directories in the tree are the filename and the time it was last viewed or edited. This history virtual filesystem shown in Listing 14 shows only a set amount of the most recent operations so as not to become too large.

Listing 14. Showing Recent View Operations

$ fls remembrance://
history
$ fls remembrance://history
edit  view
$ fls remembrance://history/view
video
$ fls -0h remembrance://history/view/video
/tmp/Wombats.anx 05 Dec  6 21:34

For each file, you also can bring up the complete list of view and edit times. This uses what libferris calls a branch filesystem. A branch filesystem best can be described as an entire personal filesystem attached to a file. Branch filesystems are accessed using the branches:// handler; all other URL handlers appear as direct children of branches://.

In Listing 15, I take a look at what branches are available for my media file and explore the remembrance view filesystem. Then, out of curiosity, I take a look into the extents branch and see that the kernel's XFS filesystem has placed the whole media file in a single contiguous extent on disk.

To see if a file has a valid digital signature, you simply can read the has-valid-signature EA on the file. The signatures branch filesystem allows much more detail to be exposed about the signature. The branchfs-attributes filesystem exposes all EAs for a file as a filesystem. Sometimes it is more convenient to access an EA as though it is a file.

Listing 15. Branch Filesystems: a Filesystem about a File


$ fls branches://file/tmp/Wombats.anx
branchfs-attributes  branchfs-medallions
branchfs-remembrance branchfs-extents
branchfs-parents     branchfs-signatures
$ fls -0 branches://file/tmp/\
  Wombats.anx/branchfs-remembrance/view
10.7M -rw-rw---- 05 Dec  6 21:34 ... 05 Dec  6 21:35
10.7M -rw-rw---- 05 Dec  6 21:34 ... 05 Dec  6 21:39
...
$ fls --xml \
  branches://file/tmp/Wombats.anx/branchfs-extents
<ferrisls>
<ferrisls
   url="branches://.../branchfs-extents"
   name="branchfs-extents"  >
 <context  name="0"
   start-block="14245376"
   end-block="14267375"
   start-address="0"
   end-address="21999"  />
</ferrisls>
</ferrisls>

Future Directions

In the future, libferris will continue to support mounting more things and obtaining more metadata where it can. A module for FUSE is planned to supplement the current Samba support.

Resources for this article: /article/8947.

Ben Martin has been working on filesystems for more than ten years. He is currently working toward a PhD combining Semantic Filesystems with Formal Concept Analysis to improve human-filesystem interaction.

Load Disqus comments