IBM's Journaled Filesystem
New features are constantly being added to the Linux kernel; one of them is the support for journaling filesystems. JFS from IBM is one of the many journaling filesystems available now for Linux. This article explains JFS internals and characteristics, how to install and configure it on a Linux server and the operational experience of JFS at the Ericsson Research Lab in Montréal.
A filesystem is used to store and manage user data on disk drives. It ensures that the integrity of the data written to the disk is identical to the data that is read back. In addition to storing data in files, a filesystem also creates and manages information, such as free space and inodes, about the filesystem itself. Filesystem structures commonly are referred to as metadata, everything concerning a file except the actual data inside the file. Elements of the file, such as its physical location and size, are tracked by metadata.
A journaling filesystem provides improved structural consistency and recoverability. It also has faster restart times than a non-journaling filesystem.
Non-journaling filesystems are subject to corruption in the event of a system failure. This is because a logical file operation often takes multiple media I/Os to accomplish and may not be reflected completely on the media at any given point in time. For example, the simple task of writing data to a file can involve numerous steps:
Allocating blocks to hold the data.
Updating the block pointers.
Updating the size of the file.
Writing the actual data.
If the system is interrupted when these operations are not fully completed, the non-journaling filesystem ends up in an inconsistent state. In this case, these filesystems rely on their fsck utility to examine all of the filesystem's metadata (for example, directories and disk addressing structures) to detect and repair structural integrity problems before restarting. fsck can be rather time consuming, with the amount of time being dependent on the size of the partition, the number of directories and the number of files in each directory. In the case of a large filesystem, journaling becomes crucial. A journaling filesystem, on the other hand, can restart in less than a second.
JFS was designed to support fast recovery from system outages, large files and partitions and a large number of directories and files. To meet these requirements, JFS provides a sub-second filesystem recovery time, achieved by journaling only the metadata. JFS also provides 64-bit scalability, with petabyte ranges for files and partitions. In addition, B+tree indices are used on all filesystem on-disk structures. For better performance, B+trees are used extensively in place of traditional linear filesystem structures.
A file is allocated in sequences of extents. An extent is a sequence of contiguous aggregate blocks allocated to a JFS object as a unit. An extent is contained wholly within a single aggregate (and therefore, a single partition). Large extents, however, may span multiple allocation groups. An extent can range in size from 1 to 224 - 1 blocks. JFS, for example, uses a 24-bit value for the length of an extent. The maximum extent, if the block size is 4K, would be 4K * 224 - 1 bytes and is equal to (~64G). Note that this limit applies only to a single extent; in no way does it limit the overall file size. Extents are indexed in a B+tree for better performance in inserting new extents, locating particular extents and so forth.
In general, the allocation policy for JFS tries to maximize contiguous allocation by allocating a minimum number of extents, with each extent as large and contiguous as possible. This allows for larger I/O transfers, resulting in improved performance.
IBM introduced its UNIX filesystem as the Journaled Filesystem (JFS) with the initial release of AIX Version 3.1. This filesystem is now called JFS1 on AIX. It has been the premier filesystem for AIX for the past ten years and has been installed in millions of customers' AIX systems. In 1995, work began to make the filesystem more scalable and to support machines with more than one processor. Another goal was to have a more portable filesystem capable of running on multiple operating systems.
Historically, the JFS1 filesystem is closely tied to the memory manager of AIX. This design is typical of a closed-source operating system or a filesystem supporting only one operating system.
The new Journaled Filesystem, on which the Linux port was based, was first shipped as part of the OS/2 Warp Server for eBusiness in April 1999, after several years of designing, coding and testing. It also shipped with OS/2 Warp Client in October 2000. Parallel to this effort, some of the JFS development team returned to the AIX Operating System Development Group in 1997 and started to move this new JFS source base to the AIX operating system. In May 2001, a second journaled filesystem, Enhanced Journaled Filesystem (JFS2), was made available for AIX 5L. Meanwhile, in December 1999, a snapshot of the OS/2 JFS source was taken, and work was begun to port JFS to Linux.
|Non-Linux FOSS: libnotify, OS X Style||Jun 18, 2013|
|Containers—Not Virtual Machines—Are the Future Cloud||Jun 17, 2013|
|Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer||Jun 12, 2013|
|Weechat, Irssi's Little Brother||Jun 11, 2013|
|One Tail Just Isn't Enough||Jun 07, 2013|
|Introduction to MapReduce with Hadoop on Linux||Jun 05, 2013|
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- RSS Feeds
- Introduction to MapReduce with Hadoop on Linux
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?