PVFS: A Parallel Virtual File System for Linux Clusters
Using networked file systems is a common method for sharing disk space on UNIX-like systems, including Linux. Sun was the first to embrace this technology by introducing the Network File System (NFS), which provides file sharing via the network. NFS is a client/server system that allows users to view, store and update files on remote computers as though they were on the user's own computer. NFS has since become the standard for file sharing in the UNIX community. Its protocol uses the Remote Procedure Call method of communication between computers.
Using NFS, the user or a system administrator can mount all or a portion of a file system. The portion of your file system that is mounted can be accessed with whatever privileges accompany your access to each file (read-only or read-write).
As the popularity and utility of this type of system have grown, more networked file systems have appeared. These new systems include advances in reliability, security, scalability and speed.
As part of my responsibilities in the Systems Research Department at Ericsson Research Canada, I evaluated Linux-networked file systems to decide what networked file system(s) to adopt for our Linux Clusters. At this stage, we are experimenting with Linux and clustering technologies and trying to build a Linux cluster that provides extremely high scalability and high availability.
An important factor in building such a system is the choice of the networked file system(s) with which it will be used. Among the tested file systems were Coda, Intermezzo, Global File System (GFS), MOSIX File System (MFS) and the Parallel Virtual File System (PVFS). After considering these and other options, the decision was made to adopt PVFS as the networked file system for our test Linux cluster. We are also using the MOSIX file system as part of the MOSIX package (see Resources) that enhances the Linux kernel with cluster-computing capabilities.
In this article, we cover our initial experiences with the PVFS system. We first discuss the design of the PVFS system in order to help familiarize readers with the terminology and components of PVFS. Next, we cover installation and configuration on the 7 CPU Linux Cluster at the Ericsson Systems Research Lab in Montréal. Finally, we discuss the strengths and weaknesses of the PVFS system in order to help others decide if PVFS is right for them.
Linux cluster technology has matured and undergone many improvements in the last few years. Commodity hardware speed has increased, and parallel software has become more advanced. Input/Output (I/O) support has traditionally lagged behind computational advances, however. This limits the performance of applications that process large amounts of data or rely on out-of-core computation.
PVFS was constructed with two main objectives. The foremost is to provide a platform for further research into parallel file systems on Linux clusters. The second objective is to meet the growing need for a high-performance parallel file system for such clusters. PVFS goals are to:
Provide high bandwidth for concurrent read/write operations from multiple processes to a common file
Support multiple APIs, including a native PVFS API, the UNIX/POSIX I/O API, as well as MPI-IO (through ROMIO)
Support Common Unix utilities such as ls, cp and rm for PVFS files
Provide a mechanism for applications developed for the UNIX I/O API to work with PVFS without recompiling
Offer robustness and scalability
Be easy to install and use
One machine, or node, in a cluster, can play a number of roles in the PVFS system. A node can be thought of as being one or more of three different types: compute, I/O or management. Typically, a single node will serve as a management node, while a subset of the nodes will be compute nodes and another subset will serve as I/O nodes. It is also possible to use all nodes as both I/O and compute nodes.
PVFS exists as a set of dæmons and a library of calls to access the file system. There are two types of dæmons, management and I/O. Typically, a single-management dæmon runs on the management node and a number of I/O dæmons run on the I/O nodes. The library of calls is used by applications running on compute nodes, or client nodes, in order to communicate with both the management dæmon and the I/O dæmons.
- Readers' Choice Awards 2013
- A Plexible Pi
- Swap Your Laptop for an iPad + Linode
- Sublime Text: One Editor to Rule Them All?
- RSS Feeds
- Mars Needs Women
- Linux Kernel News - November 2013
- Raspberry Pi: the Perfect Home Server
- Advanced Hard Drive Caching Techniques
- Tech Tip: Really Simple HTTP Server with Python
- Clarification; RasPlex is not
43 min 10 sec ago
- Clarification; RasPlex is not
45 min 3 sec ago
- Starting the conversation is the first step.
1 hour 44 min ago
3 hours 31 min ago
3 hours 31 min ago
- Nice but....
3 hours 54 min ago
- great specs
10 hours 23 min ago
- Reply to comment | Linux Journal
11 hours 29 min ago
- rilakkuma onesie
14 hours 5 min ago
- flying squirrel onesie
14 hours 7 min ago