PVFS: A Parallel Virtual File System for Linux Clusters
Management dæmons, or managers, have two responsibilities: validating permission to access files and maintaining metadata on PVFS files. All of these tasks revolve around the access of metadata files. Only one management dæmon is needed to perform these operations for a file system and a single-management dæmon can manage multiple file systems. The manager is also responsible for maintaining the file system directory hierarchy. Applications running on compute nodes communicate with the manager when performing activities such as listing directory contents, opening files and removing files.
On the other hand, I/O dæmons serve the single purpose of accessing PVFS file data and correlating data transfer between themselves and applications. Direct connections are established between applications and I/O servers in order to directly exchange data during read and write operations.
There are several options for providing PVFS access to the client nodes. First, there is a shared, or static, library that can be used to interact with the file system using its native interface. This requires writing applications specifically to use functions such as pvfs_open, however. As an alternative, there are two access methods that provide transparent access. The preferred method is to use the PVFS kernel module, which allows full access through the Linux VFS mechanism. This loadable module allows the user to mount PVFS just like any other traditional file system. Another option is to use a set of C library wrappers that are provided with PVFS. These wrappers directly trap calls to functions such as open and close before they reach the kernel level. This provides higher performance but with disadvantages in that the compatibility is incomplete, and the wrappers work only with certain supported versions of glibc.
A final option is to use the MPI-IO interface, which is part of the MPI-2 standard for message passing in parallel applications. The MPI-IO interface for PVFS is provided through the ROMIO MPI-IO implementation (see Resources) and allows MPI applications to take advantage of the features of MPI-IO when accessing PVFS. It also ensures that the MPI code will be compatible with other ROMIO-supported parallel file systems.
The test system at Ericsson Montréal started as a cluster of seven diskless Pentium grade CPUs with 256MB of RAM each. These CPUs first boot using a minimal kernel written on flash using a tool provided by the manufacturer. They then they get their IP address and download a RAM disk from a Linux box acting as both a DHCP and a TFTP server. This same machine also acts as an NFS server for the CPUs, providing a shared disk space.
When we decided to experiment with PVFS, we needed some PCs with disks to act as I/O nodes and one PC to be the management node. We added one machine, PC1, to be the management node and three machines, PC2, PC3 and PC4, with a total disk space of 35GB, to be the I/O nodes. The new map of the cluster became:
Seven Diskless Client CPUs
One Management Node
Three I/O Nodes
While PVFS developers provide RPMs for all types of nodes, we chose to recompile the source in order to optimize installation on the diskless clients. This went over without a hitch using the PVFS tarball package. For the manager and I/O nodes, we used the relevant RPM packages. The manager and I/O nodes are using the Red Hat 6.2 distribution and the 2.2.14-5.0 kernel. The diskless CPUs run a customized minimal version of the 2.2.14-5.0 kernel.
The first step towards setting up the PVFS manager is to download the PVFS manager RPM package and install it. PVFS will be installed by default under /usr/pvfs. Once the automatic installation is done, it is necessary to create the configuration files. PVFS requires two configuration files in order to operate: “pvfsdir”, which describes the directory to PVFS and “iodtab”, which describes the location of I/O dæmons. These files are created by running the mkiodtab script (as root):
[root@pc1 /root]# /usr/pvfs/bin/mkiodtab
See Listing 1 for the iodtab setup for the Parallel Virtual File System. It will also make the .pvfsdir file in the root directory.
When we ran mkiodtab on the manager, PC1, it complained that it did not find the I/O nodes. It turned out to be that we had forgotten to include entries of my I/O nodes in /etc/hosts. We updated the /etc/hosts file and reran mkiodtab; everything went okay. mkiodtab created a file called “iodtab” under /pvfs. This file contained the list of my I/O nodes. It looked like the following:
------------/pvfs/.iodtab------------ pc2:7000 pc3:7000 pc4:7000 -------------------------------------
The default port number used by I/O dæmon software to allow clients to connect to it over the network is 7,000.
After running mkiodtab, we did the following to start PC1 as the PVFS manager:
Start the manager dæ: % /usr/pvfs/bin/mgr % /usr/pvfs/bin/enablemgr
Running enablemgr on the management node ensures that the next time the machine is booted the dæmons will be automatically started, so that it doesn't need to be started manually after rebooting. The enablemgr command only needs to be run once to set up the appropriate links.
Fast/Flexible Linux OS Recovery
On Demand Now
In this live one-hour webinar, learn how to enhance your existing backup strategies for complete disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible full-system recovery solution for UNIX and Linux systems.
Join Linux Journal's Shawn Powers and David Huffman, President/CEO, Storix, Inc.
Free to Linux Journal readers.Register Now!
- Download "Linux Management with Red Hat Satellite: Measuring Business Impact and ROI"
- Profiles and RC Files
- Astronomy for KDE
- Understanding Ceph and Its Place in the Market
- Maru OS Brings Debian to Your Phone
- Snappy Moves to New Platforms
- Git 2.9 Released
- What's Our Next Fight?
- OpenSwitch Finds a New Home
- The Giant Zero, Part 0.x
With all the industry talk about the benefits of Linux on Power and all the performance advantages offered by its open architecture, you may be considering a move in that direction. If you are thinking about analytics, big data and cloud computing, you would be right to evaluate Power. The idea of using commodity x86 hardware and replacing it every three years is an outdated cost model. It doesn’t consider the total cost of ownership, and it doesn’t consider the advantage of real processing power, high-availability and multithreading like a demon.
This ebook takes a look at some of the practical applications of the Linux on Power platform and ways you might bring all the performance power of this open architecture to bear for your organization. There are no smoke and mirrors here—just hard, cold, empirical evidence provided by independent sources. I also consider some innovative ways Linux on Power will be used in the future.Get the Guide