ZFS: Finding Its Way to a Linux Near You?

It seems like only yesterday that I read Jeff Bonwick's blog entry "ZFS: The Last Word in Filesystems". It was Halloween of 2005 that ZFS was fully integrated into Sun Microsystem's Solaris, and the filesystem was very well received. For the readers not familiar with ZFS, it is a combined all-purpose filesystem and volume manager. It simplified data storage management while also offering the most advanced features of the time. Such technologies include drive pooling with software RAID support, file snapshots, in-line data compression, data deduplication, built-in data integrity, advanced caching (to DRAM and SSD), and more. Today, the ZFS trademark and technology is owned and maintained by the Oracle Corporation.

Also in 2005, Sun Microsystems introduced OpenSolaris. Now a defunct project, OpenSolaris was a fully functional Solaris operating system built entirely from open source, which included ZFS, and all of which were re-licensed to the Common Development and Distribution License (CDDL), a weak copyleft license based on the Mozilla Public License (MPL). Although open source, ZFS and anything else under the CDDL was, and supposedly still is, incompatible with the GNU General Public License (GPL). This includes the Linux kernel and eventually would lead to the birth of Btrfs.

To avoid licensing infringements, the earliest incarnations of ZFS on Linux were written for the Filesystem in Userspace (FUSE). This prevented the technology from touching the Linux kernel. It also added its fair share of limitations. Being in userspace, it never could really measure up to its Solaris and FreeBSD counterparts. Over time, some of the FUSE implementations were highly neglected and in some cases abandoned. In 2008, the "ZFS on Linux" project changed everything by developing an in-kernel implementation of ZFS. Since its conception, the project was met with a lot of resistance (and criticism) from within the Linux community, all relating to licensing.

Fast-forward to the present, and two distributions have challenged this. Last month, Canonical, the parent company of the Ubuntu Linux distribution released the latest Ubuntu 16.04, codenamed Xenial Xerus. One of the most noteworthy additions to this release was the full integration of pre-built ZFS modules. Although Canonical now ships Ubuntu with ZFS, it has publicly stated that its legal team did not see a violation of the GPL. This matter is still being debated.

Shortly following this news and through a separate and completely unrelated effort, the Debian distribution announced the inclusion of the ZFS source code, buildable via the Dynamic Kernel Module Support (DKMS) framework. However, it is not provided in the "main" section archive but instead in "contrib". Under the legal advice of the Software Freedom Law Center, this approach is seen as not violating the GPL license.

Although including ZFS is an achievement in its own right, it still falls short from the ZFS on Solaris. It is easy to get swept away by ZFS in Solaris. It is fully ingrained in the Solaris ecosystem, from the boot environment all the way to the user experience. The first thing that comes to mind is its customizable snapshot support. A snapshot is the state of a particular system at a particular point in time. In the case of ZFS, this concept is directed toward file level state. ZFS uses a copy-on-write transaction model. Blocks containing active data are never overwritten in place. For every write, a new block is allocated, and the modified data is written to it. All metadata blocks referencing the original data block are also updated to reflect this change and then reallocated, keeping the original metadata contents unmodified in their original place. This approach makes it possible to enable file snapshots.

Figure 1. Snapshot Customization through the Time Slider Manager

______________________

Petros Koutoupis is currently a senior software developer at Cleversafe, an IBM Company. He is also the creator and maintainer of the RapidDisk Project. Petros has worked in the data storage industry for more than a decade.