PVFS: A Parallel Virtual File System for Linux Clusters
Using networked file systems is a common method for sharing disk space on UNIX-like systems, including Linux. Sun was the first to embrace this technology by introducing the Network File System (NFS), which provides file sharing via the network. NFS is a client/server system that allows users to view, store and update files on remote computers as though they were on the user's own computer. NFS has since become the standard for file sharing in the UNIX community. Its protocol uses the Remote Procedure Call method of communication between computers.
Using NFS, the user or a system administrator can mount all or a portion of a file system. The portion of your file system that is mounted can be accessed with whatever privileges accompany your access to each file (read-only or read-write).
As the popularity and utility of this type of system have grown, more networked file systems have appeared. These new systems include advances in reliability, security, scalability and speed.
As part of my responsibilities in the Systems Research Department at Ericsson Research Canada, I evaluated Linux-networked file systems to decide what networked file system(s) to adopt for our Linux Clusters. At this stage, we are experimenting with Linux and clustering technologies and trying to build a Linux cluster that provides extremely high scalability and high availability.
An important factor in building such a system is the choice of the networked file system(s) with which it will be used. Among the tested file systems were Coda, Intermezzo, Global File System (GFS), MOSIX File System (MFS) and the Parallel Virtual File System (PVFS). After considering these and other options, the decision was made to adopt PVFS as the networked file system for our test Linux cluster. We are also using the MOSIX file system as part of the MOSIX package (see Resources) that enhances the Linux kernel with cluster-computing capabilities.
In this article, we cover our initial experiences with the PVFS system. We first discuss the design of the PVFS system in order to help familiarize readers with the terminology and components of PVFS. Next, we cover installation and configuration on the 7 CPU Linux Cluster at the Ericsson Systems Research Lab in Montréal. Finally, we discuss the strengths and weaknesses of the PVFS system in order to help others decide if PVFS is right for them.
Linux cluster technology has matured and undergone many improvements in the last few years. Commodity hardware speed has increased, and parallel software has become more advanced. Input/Output (I/O) support has traditionally lagged behind computational advances, however. This limits the performance of applications that process large amounts of data or rely on out-of-core computation.
PVFS was constructed with two main objectives. The foremost is to provide a platform for further research into parallel file systems on Linux clusters. The second objective is to meet the growing need for a high-performance parallel file system for such clusters. PVFS goals are to:
Provide high bandwidth for concurrent read/write operations from multiple processes to a common file
Support multiple APIs, including a native PVFS API, the UNIX/POSIX I/O API, as well as MPI-IO (through ROMIO)
Support Common Unix utilities such as ls, cp and rm for PVFS files
Provide a mechanism for applications developed for the UNIX I/O API to work with PVFS without recompiling
Offer robustness and scalability
Be easy to install and use
One machine, or node, in a cluster, can play a number of roles in the PVFS system. A node can be thought of as being one or more of three different types: compute, I/O or management. Typically, a single node will serve as a management node, while a subset of the nodes will be compute nodes and another subset will serve as I/O nodes. It is also possible to use all nodes as both I/O and compute nodes.
PVFS exists as a set of dæmons and a library of calls to access the file system. There are two types of dæmons, management and I/O. Typically, a single-management dæmon runs on the management node and a number of I/O dæmons run on the I/O nodes. The library of calls is used by applications running on compute nodes, or client nodes, in order to communicate with both the management dæmon and the I/O dæmons.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- The Humble Hacker?
- Server Hardening
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- The Death of RoboVM
- EnterpriseDB's EDB Postgres Advanced Server and EDB Postgres Enterprise Manager
- The US Government and Open-Source Software
- ACI Worldwide's UP Retail Payments
- Open-Source Project Secretly Funded by CIA
- Varnish Software's Hitch
- New Container Image Standard Promises More Portable Apps
In modern computer systems, privacy and security are mandatory. However, connections from the outside over public networks automatically imply risks. One easily available solution to avoid eavesdroppers’ attempts is SSH. But, its wide adoption during the past 21 years has made it a target for attackers, so hardening your system properly is a must.
Additionally, in highly regulated markets, you must comply with specific operational requirements, proving that you conform to standards and even that you have included new mandatory authentication methods, such as two-factor authentication. In this ebook, I discuss SSH and how to configure and manage it to guarantee that your network is safe, your data is secure and that you comply with relevant regulations.Get the Guide