Building a Multisourced Infrastructure Using OpenVPN
Have you ever needed to expand your colocated servers at more than one provider and allow applications to communicate as if they were on the same LAN, possibly over multiple sets of firewalls and layers of NAT? Or, maybe you've wanted to move from one hosting service to another to take advantage of lower pricing or better uptime but would have preferred to do it gradually instead of in a single swoop (and a weekend-long maintenance window)? Or, maybe you've considered the Amazon EC2 cloud to host part, but not all, of your infrastructure? If your answer to any of these questions is yes, what you want is essentially a multisourced infrastructure.
The Amazon EC2
The Amazon EC2 (Elastic Compute Cloud) is a Web service that allows users to provision new machines in an Amazon-hosted virtualized infrastructure in a matter of minutes, using a publicly available API. Users get full root access and can install almost any OS or application in their Amazon Machine Images. Web service APIs allow users to reboot their instances remotely and scale capacity quickly if necessary, by adding tens or even hundreds of machines. Additionally, there is no up-front hardware setup costs—Amazon charges only for the capacity you actually use; there is no minimum fee. As more applications find their way to Amazon's virtual computing environment, system administrators are looking for ways to provide secure connectivity over the public Internet between new machines in the Amazon EC2 and old machines in their regular data centers. This article describes one such technique—how to build a multisourced infrastructure based on OpenVPN.
Let's take a look at a simple distributed application, which consists of multiple services, a LAMP stack. Traditionally, you would start with Apache and MySQL on a single server. As your site grows, you would provision another server from your provider and add a second Apache instance. Later, you might want to provision yet another machine to be a dedicated database server to improve performance. This is a typical single-sourced infrastructure—all services run within a single physical environment, controlled and supported by a single provider.
In contrast, with a multisourced infrastructure, you no longer are limited to one provider or one data center. You are free to mix and match hosting plans from different providers to suit your business and architecture better, and you can use as many providers as you like. Your applications still can communicate with one another, but instead of having a physical LAN, it's now a virtual LAN that sits on top of public Internet links. You can grow your services horizontally and achieve better geographic redundancy and fault tolerance at the same time, all without significant changes in your application. If it works in a single-sourced physical LAN, it most likely will work in multisourced virtual LAN as well.
Additionally, you can leverage the strengths of a particular provider for just a subset of your services. Going back to the LAMP stack as our example, with Amazon EC2, you can provision many Apache instances in response to the current load quickly; although you might prefer to run MySQL on bare metal elsewhere instead of in an EC2 virtual machine.
Finally, this method allows you to expand your corporate infrastructure outside your current data center or allow outside services to use applications in your corporate data center. Consider a remotely hosted data-crunching cluster that you rent by the hour, which uses your corporate data warehouse system for its input. As you can see, a multisourced infrastructure is more flexible and can accommodate various scenarios and needs.
In this article, I describe a particular implementation of the multisourced infrastructure concept that we at CohesiveFT (www.cohesiveft.com) developed using OpenVPN and that has been running in our production environment since mid-summer 2007. We chose OpenVPN primarily because it uses standard OpenSSL encryption, runs on multiple operating systems and does not require kernel patching or additional modules. The latter benefit is of key importance. Many Virtual Private Server (VPS) hosting solutions currently provide great service with pricing that is often better than other forms of hosting. These providers build guest OS kernels specifically tailored for their environment and method of virtualization. As a result, you probably want to avoid rebuilding the Linux kernel on your VPS as much as possible. Not that it can't be done, but you can save some time and probably get faster technical support if you don't do it.
Among the alternatives to OpenVPN, there is Openswan, a code fork of the original FreeS/WAN Project, but it requires a kernel patch to support NAT traversal, according to its wiki (wiki.openswan.org/index.php/Openswan/Install).
The OpenVPN protocol also is firewall-friendly, as it can pass all traffic over a single UDP tunnel (the default port is 1194). That feature, coupled with SSL encryption, makes this solution very difficult to attack when data packets pass through the public Internet.
OpenVPN turned out to be a great choice and offered us all the functionality we expected, except for one very important feature, fault tolerance. When you use a VPN to provide corporate network access to remote users, the solution is very simple—you deploy several OpenVPN servers and configure each server with its own network segment (for example, server 10.5.0.0 255.255.0.0 and server 10.6.0.0 255.255.0.0). In a typical scenario, the dynamic IP address assigned to a remote user will not matter much, as long as you configure firewalls, applications and services to allow both subnets.
When you build a multisourced infrastructure, however, this is not an acceptable solution, unless you want servers to change their IP addresses from time to time. To satisfy redundancy and fault-tolerance requirements, we needed an active-active pair of OpenVPN servers to share a common address space—all hosts must be able to access each other by static IP addresses at all times, no matter which OpenVPN server provides connectivity at either end of the communication. Then, if we lose one OpenVPN server, the other will provide all connectivity. And, if they are both up, both will be accepting connections from clients to share the load. This feature was not available as a part of the OpenVPN source distribution, so we developed a standalone dynamic routing dæmon to facilitate active-active load balancing. You can find its source code, along with useful links, use-case scenarios and mailing lists, at www.cohesiveft.com/multisourced-infra.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- New Products
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- Validate an E-Mail Address with PHP, the Right Way
- Tech Tip: Really Simple HTTP Server with Python
- Build a Skype Server for Your Home Phone System
- Why Python?
- A Topic for Discussion - Open Source Feature-Richness?
- Reply to comment | Linux Journal
33 min 30 sec ago - Not free anymore
4 hours 35 min ago - Great
8 hours 22 min ago - Reply to comment | Linux Journal
8 hours 30 min ago - Understanding the Linux Kernel
10 hours 45 min ago - General
13 hours 15 min ago - Kernel Problem
23 hours 17 min ago - BASH script to log IPs on public web server
1 day 3 hours ago - DynDNS
1 day 7 hours ago - Reply to comment | Linux Journal
1 day 7 hours ago





Comments
Cube-routed on GitHub
Dmitry has been very responsive and helpful, and has made cube-routed available for download at http://cohesiveft.com/dnld/cube-routed-0.1.tar.gz
I added it to my GitHub account here: http://github.com/aguynamedben/cube-routed Hope this helps readers of this article!
-Ben
Cube-routed on GitHub
Dmitry has been very responsive and helpful, and has made cube-routed available for download at http://cohesiveft.com/dnld/cube-routed-0.1.tar.gz
I added it to my GitHub account here: http://github.com/aguynamedben/cube-routed Hope this helps readers of this article!
-Ben
cube-routed de-open-sourced?
I followed along through half this article and it looks like cube-routed has been de-open-sourced by Dmitry/CohesiveFT! I can't find it anywhere, Dmitry's GitHub, cohesiveft.com, or anywhere. Looks like they pulled a really non-FOSS maneuver, please tell me I'm wrong. =( There should at least be a disclaimer at the beginning of the article saying it's impossible to complete.
Ben Standefer
Good Solution on Multisourced Infrastructure
Hello Dmitriy Samovskiy
Your article is the thing which I was looking for a long time. I have different service on several data center and your article help me to communicate each other in secure way. - Thanks Man!