PVFS: A Parallel Virtual File System for Linux Clusters

An introduction to the Parallel Virtual File System and a look at how one company installed and tested it.
Setting up the I/O Nodes

Installation of I/O nodes is equally simple. First, we installed the RPM, then started each I/O dæmon as follows:

% /usr/pvfs/bin/iod
% /usr/pvfs/bin/enableiod

Running enableiod on the I/O nodes ensures that the next time the machines are booted, the dæmons will be started automatically. The enableiod command only needs to be run once to set up the appropriate links.

The I/O dæmons rely on a configuration file, /etc/iod.conf, to tell them where to store data. This file is automatically created by the RPM and directs the I/O dæmons to store data in a directory called /pvfs_data. We created this directory on each of the I/O nodes with:

% mkdir /pvfs_data
Setting up the Diskless CPUs as Compute Nodes

The installation of the client CPUs was more delicate since, as mentioned above, we needed to minimize the installation to use less space on the RAM disk. The minimal set of installation files that we used for the client nodes were:

------------ List of files installed on the Compute Nodes -------------
/etc/pvfstab
/usr/local/pvfs/pvfsd
/usr/local/pvfs/pvfs.o
/usr/local/pvfs/mount.pvfs
/usr/local/pvfs/libpvfs.so.1.4
-------------------------------------------------------------------------

The /etc/pvfstab is used by the compute nodes to determine the locations of the manager and the PVFS files. Its format is very similar to the /etc/fstab file. For our setup, the /etc/pvfstab file looked like the following:

----------------/etc/pvfstab--------------------
pc1:/pvfs        /pvfs pvfs  port=3000  0  0
------------------------------------------------
This configuration file specified that:
  • The management node is PC1

  • The directory where the manager is storing metadata is /pvfs

  • The PVFS file system is mounted on /pvfs on the client

  • The port on which the manager is listening is 3000

The PVFS dæmon is /usr/pvfs/bin/pvfsd. It works in conjunction with the kernel module to provide communication with the file system through the kernel. The dæmon uses the same PVFS library calls that a custom user application would, but it translates them into a form recognized by the kernel module so that it is hidden from applications not specifically compiled for PVFS. This is similar to the approach used by the Coda file system in which a user-level dæmon cooperates with the Coda kernel code to access the file system (see Resources).

/usr/pvfs/bin/mount.pvfs is the special mount command supplied with PVFS. The client CPUs use it to mount the PVFS file system on a local directory. For these CPUs, we have created a small shell script, /etc/rc.d/rc.pvfs, that is executed when the CPUs are started to ensure that they start up automatically as PVFS compute nodes without any manual intervention. The content of rc.pvfs is the following:

-----------------/etc/rc.d/rc.pvfs------------------
#!/bin/sh
/bin/mknod /dev/pvfsd c 60 0
/sbin/insmod  /usr/pvfs/bin/pvfs.o
/usr/pvfs/bin/pvfsd
/usr/pvfs/bin/mount.pvfs pc1:/pvfs /mnt/pvfs
----------------------------------------------------

The script creates a node in /dev that will be used by pvfsd. It loads the PVFS module, starts the PVFS dæmon and mounts the PVFS file system locally under /mnt/pvfs.

As noted earlier, any I/O node or management node can also serve as a compute node. To enable this, we simply installed the PVFS client RPM on each I/O node, as we are not worried about conserving disk space on the I/O nodes. The /etc/pvfstab and /etc/rc.d/rc.pvfs were then set up to be identical to those used on the diskless clients. Now, both the diskless clients and the I/O nodes can access the file system in the same manner.

Testing the Installation

After completing these installation steps we were able to copy and access files within the PVFS file system from all of the machines. The RAM disk that was installed on the CPUs included as part of the setup the Apache Web Server and Real Server, a video streaming server from Real Networks. We used WebBench (from ZDNet.com) to generate web traffic to the CPUs and changed the configurations for both Apache and Real Server to place the default root document inside the PVFS file system. This scenario allowed every CPU to run as a stand-alone web server with its own IP address and serve multimedia requests using Real Server. This allowed hosting web files, including big files such as mp3 and rm files, from within the PVFS file system.

Figure 3. PVFS/Linux Compatibility

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: PVFS: A Parallel Virtual File System for Linux Clusters

Anonymous's picture

Hi,

I have the feeling that I already know the answer on my question. However, I'd like to know if it is possible to install PVFS on a Linux cluster as a non-root user?

Greetings,

Jeroen

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix