PVFS: A Parallel Virtual File System for Linux Clusters
Installation of I/O nodes is equally simple. First, we installed the RPM, then started each I/O dæmon as follows:
% /usr/pvfs/bin/iod % /usr/pvfs/bin/enableiod
Running enableiod on the I/O nodes ensures that the next time the machines are booted, the dæmons will be started automatically. The enableiod command only needs to be run once to set up the appropriate links.
The I/O dæmons rely on a configuration file, /etc/iod.conf, to tell them where to store data. This file is automatically created by the RPM and directs the I/O dæmons to store data in a directory called /pvfs_data. We created this directory on each of the I/O nodes with:
% mkdir /pvfs_data
The installation of the client CPUs was more delicate since, as mentioned above, we needed to minimize the installation to use less space on the RAM disk. The minimal set of installation files that we used for the client nodes were:
------------ List of files installed on the Compute Nodes ------------- /etc/pvfstab /usr/local/pvfs/pvfsd /usr/local/pvfs/pvfs.o /usr/local/pvfs/mount.pvfs /usr/local/pvfs/libpvfs.so.1.4 -------------------------------------------------------------------------
The /etc/pvfstab is used by the compute nodes to determine the locations of the manager and the PVFS files. Its format is very similar to the /etc/fstab file. For our setup, the /etc/pvfstab file looked like the following:
----------------/etc/pvfstab-------------------- pc1:/pvfs /pvfs pvfs port=3000 0 0 ------------------------------------------------This configuration file specified that:
The management node is PC1
The directory where the manager is storing metadata is /pvfs
The PVFS file system is mounted on /pvfs on the client
The port on which the manager is listening is 3000
/usr/pvfs/bin/mount.pvfs is the special mount command supplied with PVFS. The client CPUs use it to mount the PVFS file system on a local directory. For these CPUs, we have created a small shell script, /etc/rc.d/rc.pvfs, that is executed when the CPUs are started to ensure that they start up automatically as PVFS compute nodes without any manual intervention. The content of rc.pvfs is the following:
-----------------/etc/rc.d/rc.pvfs------------------ #!/bin/sh /bin/mknod /dev/pvfsd c 60 0 /sbin/insmod /usr/pvfs/bin/pvfs.o /usr/pvfs/bin/pvfsd /usr/pvfs/bin/mount.pvfs pc1:/pvfs /mnt/pvfs ----------------------------------------------------
The script creates a node in /dev that will be used by pvfsd. It loads the PVFS module, starts the PVFS dæmon and mounts the PVFS file system locally under /mnt/pvfs.
As noted earlier, any I/O node or management node can also serve as a compute node. To enable this, we simply installed the PVFS client RPM on each I/O node, as we are not worried about conserving disk space on the I/O nodes. The /etc/pvfstab and /etc/rc.d/rc.pvfs were then set up to be identical to those used on the diskless clients. Now, both the diskless clients and the I/O nodes can access the file system in the same manner.
After completing these installation steps we were able to copy and access files within the PVFS file system from all of the machines. The RAM disk that was installed on the CPUs included as part of the setup the Apache Web Server and Real Server, a video streaming server from Real Networks. We used WebBench (from ZDNet.com) to generate web traffic to the CPUs and changed the configurations for both Apache and Real Server to place the default root document inside the PVFS file system. This scenario allowed every CPU to run as a stand-alone web server with its own IP address and serve multimedia requests using Real Server. This allowed hosting web files, including big files such as mp3 and rm files, from within the PVFS file system.

Figure 3. PVFS/Linux Compatibility
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.
Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.
Sponsored by ActiveState
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Non-Linux FOSS: libnotify, OS X Style | Jun 18, 2013 |
| Containers—Not Virtual Machines—Are the Future Cloud | Jun 17, 2013 |
| Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer | Jun 12, 2013 |
| Weechat, Irssi's Little Brother | Jun 11, 2013 |
| One Tail Just Isn't Enough | Jun 07, 2013 |
| Introduction to MapReduce with Hadoop on Linux | Jun 05, 2013 |
- Containers—Not Virtual Machines—Are the Future Cloud
- Non-Linux FOSS: libnotify, OS X Style
- Linux Systems Administrator
- Validate an E-Mail Address with PHP, the Right Way
- Lock-Free Multi-Producer Multi-Consumer Queue on Ring Buffer
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- RSS Feeds
- Introduction to MapReduce with Hadoop on Linux




Comments
Re: PVFS: A Parallel Virtual File System for Linux Clusters
Hi,
I have the feeling that I already know the answer on my question. However, I'd like to know if it is possible to install PVFS on a Linux cluster as a non-root user?
Greetings,
Jeroen