PVFS: A Parallel Virtual File System for Linux Clusters
Installation of I/O nodes is equally simple. First, we installed the RPM, then started each I/O dæmon as follows:
% /usr/pvfs/bin/iod % /usr/pvfs/bin/enableiod
Running enableiod on the I/O nodes ensures that the next time the machines are booted, the dæmons will be started automatically. The enableiod command only needs to be run once to set up the appropriate links.
The I/O dæmons rely on a configuration file, /etc/iod.conf, to tell them where to store data. This file is automatically created by the RPM and directs the I/O dæmons to store data in a directory called /pvfs_data. We created this directory on each of the I/O nodes with:
% mkdir /pvfs_data
The installation of the client CPUs was more delicate since, as mentioned above, we needed to minimize the installation to use less space on the RAM disk. The minimal set of installation files that we used for the client nodes were:
------------ List of files installed on the Compute Nodes ------------- /etc/pvfstab /usr/local/pvfs/pvfsd /usr/local/pvfs/pvfs.o /usr/local/pvfs/mount.pvfs /usr/local/pvfs/libpvfs.so.1.4 -------------------------------------------------------------------------
The /etc/pvfstab is used by the compute nodes to determine the locations of the manager and the PVFS files. Its format is very similar to the /etc/fstab file. For our setup, the /etc/pvfstab file looked like the following:
----------------/etc/pvfstab-------------------- pc1:/pvfs /pvfs pvfs port=3000 0 0 ------------------------------------------------This configuration file specified that:
The management node is PC1
The directory where the manager is storing metadata is /pvfs
The PVFS file system is mounted on /pvfs on the client
The port on which the manager is listening is 3000
/usr/pvfs/bin/mount.pvfs is the special mount command supplied with PVFS. The client CPUs use it to mount the PVFS file system on a local directory. For these CPUs, we have created a small shell script, /etc/rc.d/rc.pvfs, that is executed when the CPUs are started to ensure that they start up automatically as PVFS compute nodes without any manual intervention. The content of rc.pvfs is the following:
-----------------/etc/rc.d/rc.pvfs------------------ #!/bin/sh /bin/mknod /dev/pvfsd c 60 0 /sbin/insmod /usr/pvfs/bin/pvfs.o /usr/pvfs/bin/pvfsd /usr/pvfs/bin/mount.pvfs pc1:/pvfs /mnt/pvfs ----------------------------------------------------
The script creates a node in /dev that will be used by pvfsd. It loads the PVFS module, starts the PVFS dæmon and mounts the PVFS file system locally under /mnt/pvfs.
As noted earlier, any I/O node or management node can also serve as a compute node. To enable this, we simply installed the PVFS client RPM on each I/O node, as we are not worried about conserving disk space on the I/O nodes. The /etc/pvfstab and /etc/rc.d/rc.pvfs were then set up to be identical to those used on the diskless clients. Now, both the diskless clients and the I/O nodes can access the file system in the same manner.
After completing these installation steps we were able to copy and access files within the PVFS file system from all of the machines. The RAM disk that was installed on the CPUs included as part of the setup the Apache Web Server and Real Server, a video streaming server from Real Networks. We used WebBench (from ZDNet.com) to generate web traffic to the CPUs and changed the configurations for both Apache and Real Server to place the default root document inside the PVFS file system. This scenario allowed every CPU to run as a stand-alone web server with its own IP address and serve multimedia requests using Real Server. This allowed hosting web files, including big files such as mp3 and rm files, from within the PVFS file system.

Figure 3. PVFS/Linux Compatibility
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- New Products
- Linux Systems Administrator
- Senior Perl Developer
- Technical Support Rep
- UX Designer
- Web & UI Developer (JavaScript & j Query)
- Designing Electronics with Linux
- Dynamic DNS—an Object Lesson in Problem Solving
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Using Salt Stack and Vagrant for Drupal Development
- Reply to comment | Linux Journal
54 min 17 sec ago - Nice article, thanks for the
11 hours 34 min ago - I once had a better way I
17 hours 20 min ago - Not only you I too assumed
17 hours 38 min ago - another very interesting
19 hours 31 min ago - Reply to comment | Linux Journal
21 hours 24 min ago - Reply to comment | Linux Journal
1 day 4 hours ago - Reply to comment | Linux Journal
1 day 4 hours ago - Favorite (and easily brute-forced) pw's
1 day 6 hours ago - Have you tried Boxen? It's a
1 day 12 hours ago




Comments
Re: PVFS: A Parallel Virtual File System for Linux Clusters
Hi,
I have the feeling that I already know the answer on my question. However, I'd like to know if it is possible to install PVFS on a Linux cluster as a non-root user?
Greetings,
Jeroen