VPN Implementation in Cluster Computing

Negotiating security and scalability issues with clusters by setting up a virtual private network.

Special considerations are involved when completing the implementation of a cluster. Even with the queue system and parallel environment, extra services are required for a cluster to function as a multi-user computational platform. These services include the well known network services NFS, NIS and rsh. NFS allows cluster nodes to share user home directories as well as installation files for the queue system and parallel environment. NIS provides correct file and process ownership across all the cluster nodes from the single source on the master machine. Although these services are significant components of a cluster, such services create numerous vulnerabilities. Thus, it would be insecure to have cluster nodes function on an open network. For these reasons, computational cluster nodes usually reside on private networks, often accessible for users only through a firewall gateway. In most cases, the firewall is configured on the master node using ipchains or iptables.

Having all cluster machines on the same private network requires them to be connected to the same switch (or linked switches) and, therefore, localized at the same proximity. This situation creates a severe limitation in terms of cluster scalability. It is impossible to combine private network machines in different geographic locations into one joint cluster, because private networks are not routable with the standard Internet Protocol (IP).

Combining cluster resources on different locations, so that users from various departments would be able to take advantage of available computational nodes, however, is possible. Theoretically, merging clusters is not only desirable but also advantageous, in the sense that different clusters are not localized at one place but are, rather, centralized. This setup provides higher availability and efficiency to clusters, and such a proposition is highly attractive. But in order to merge clusters, all the machines would have to be on a public network instead of a private one, because every single node on every cluster needs to be directly accessible from the others. If we were to do this, however, it might create insurmountable problems because of the potential--the inevitable--security breaches. We can see then that to serve scalability, we severely compromise security, but where we satisfy security concerns, scalability becomes significantly limited. Faced with such a problem, how can we make clusters scalable and, at the same time, establish a rock-solid security on the cluster networks? Enter the Virtual Private Network (VPN).

VPNs often are heralded as one of the most cutting-edge, cost-saving solutions to various applications, and they are widely deployed in the areas of security, infrastructure expansion and inter-networking. A VPN adds more dimension to networking and infrastructure because it enables private networks to be connected in secure and robust ways. Private networks generally are not accessible from the Internet and are networked only within confined locations.

The technology behind VPNs, however, changes what we have previously known about private networks. Through effective use of a VPN, we are able to connect previously unrelated private networks or individual hosts both securely and transparently. Being able to connect private networks opens a whole slew of new possibilities. With a VPN, we are not limited to resources in only one location (a single private network). We can finally take advantage of resources and information from all other private networks connected via VPN gateways, without having to largely change what we already have in our networks. In many cases, a VPN is an invaluable solution to integrate and better utilize fragmented resources.

In our environment, the VPN plays a significant role in combining high performance Linux computational clusters located on separate private networks into one large cluster. The VPN, with its power to transparently combine two private networks through an existing open network, enabled us to connect seamlessly two unrelated clusters in different physical locations. The VPN connection creates a tunnel between gateways that allows hosts on two different subnets (e.g., and to see each other as if they are on the same network. Thus, we were able to operate critical network services such as NFS, NIS, rsh and the queue system over two different private networks, without compromising security over the open network. Furthermore, the VPN encrypts all the data being passed through the established tunnel and makes the network more secure and less prone to malicious exploits.

The VPN solved not only the previously discussed problems with security, but it also opened a new door for scalability. Since all the cluster nodes can reside in private networks and operate through the VPN, the entire infrastructure can be better organized and the IP addresses can be efficiently managed, resulting in a more scalable and much cleaner network. Before VPNs, it was a pending problem to assign public IP addresses to every single node on the cluster, which limited the maximum number of nodes that can be added to the cluster. Now, with a VPN, our cluster can expand in greater magnitude and scale in an organized manner. As can be seen, we have successfully integrated the VPN technology to our networks and have addressed important issues of scalability, accessibility and security in cluster computing.