High Availability Linux Web Servers
Imagine yourself as the System Administrator for a fairly large web site. It's 5:00 AM on a Monday morning. You're awakened by a page from Big Brother. One of three web servers has just dropped off the network. Suddenly, a third of your traffic is going unanswered. What can you do? The commute to the office isn't a short one, and by the time you get there, you'll already have dropped thousands of hits, which could mean lost revenue, decreased productivity or a missed deadline. Whatever the case may be, someone is going to be affected. As you begin your journey into work, you wonder how this problem could have been prevented.
In fact, a number of solutions are available, many of which require expensive hardware or software. This article outlines a simple and effective method of achieving the same functionality in a cost-effective manner. This method uses a router and the loopback interfaces of your Linux web servers. We achieve high availability by configuring multiple hosts to be capable of serving traffic for the same IP addresses at any given time. Conventionally, we think of virtual IP addresses as being assigned to Ethernet interfaces. However, no two Ethernet interfaces can share the same IP address. We're able to assign the same IP addresses to multiple hosts by binding them to loopback interfaces instead. For instance, a SYN packet, destined for one of these loopback interfaces, travels across the wire to a router that decides the next packet hop based on its routing table. The packet is then forwarded to the next hop—the Ethernet interface on one of many redundant web servers. Then, the packet is forwarded from the Ethernet interface to one of the configured loopbacks on the system. An ACK (acknowledgement) will travel along the same path in reverse. The packet originates on the loopback interface, is forwarded to the Ethernet interface, then back to the router to be sent on its journey back to the original host that sent the SYN packet. Again, the beauty of this scheme is the ability to configure multiple hosts with the same IP address bound to loopback interfaces. By doing so, we've enabled ourselves to redirect traffic for a particular IP address or even an entire subnet by simply changing a route in that last hop router. This saves time and minimizes traffic loss. The process can even be automated using simple shell scripts.
The kernel must be configured to support IP aliasing. IP aliasing is the process of binding multiple IP addresses to a given network interface, thus creating “virtual” interfaces. Under Linux, interface names are assigned linearly. For example, the first loopback interface is called lo, the second lo:1, the third lo:2 and so on. You can see which interfaces are configured on your system by typing:
Configure the kernel with support for TCP/IP, network aliasing and IP aliasing. Under Linux 2.0.x, this is accomplished by answering “yes” to the following kernel configuration options:
Network aliasing (CONFIG_NET_ALIAS) [Y/n/?] y TCP/IP networking (CONFIG_INET) [Y/n/?] y IP: aliasing support (CONFIG_IP_ALIAS) [Y/m/n/?] y
Our fictitious network will consist of four machines, although you could support the same functionality with as few as two boxes or as many as you anticipate needing. Four boxes will allow us to serve a hefty amount of traffic and still allow plenty of room for growth. Having all four machines handling traffic for a single web site will provide some load balancing as well, using “round robin” DNS. If you ever exceed the capacity of your web servers, adding additional machines is a simple task.
We'll take the class C address 192.168.1.0 and apply a 27-bit subnet mask which will yield 8 subnets and 240 usable hosts.
Note that according to the RFC, the upper and lower subnets will not be usable. Some operating systems will not allow you to configure an interface using an address that falls into one of these subnets. Some routers require you to enable this feature implicitly. For example, Cisco requires that the router be configured with the command ip subnet-zero. This is implementation-dependent, although I have yet to see a UNIX or Microsoft-based host that had a problem utilizing all subnets. If you are unable to use all eight subnets or you are an RFC compliancy fanatic, this configuration will yield 6 subnets and 180 unique hosts.
Traffic can be spread across our 4 hosts for up to 30 different web servers quite easily. It also leaves us with four free subnets for future expansion. Using subnets allows traffic to be redirected from one machine to another with a few simple commands. However, your requirements may not call for an implementation as large as the one in our example. The same functionality can be achieved using host routes, so instead of the routing table having an entry for an entire subnet, the entry is for a single IP address using a 32-bit subnet mask. I'll try to explain the differences where applicable.
While here, we can use a subnet for the Ethernet interfaces of our web servers from our class C; namely, 192.168.1.1 for our router and 192.168.1.2, 192.168.1.3 and 192.168.1.4 for our web servers. Under Red Hat, this is done by editing the /etc/sysconfig/network-scripts/eth0 file to look something like this:
DEVICE=eth0 IPADDR=192.168.1.2 NETMASK=255.255.255.224 NETWORK=192.168.1.0 BROADCAST=192.168.1.31 ONBOOT=yes
You'll also want to edit the /etc/sysconfig/network file to configure the appropriate default route. Mine looks like this:
NETWORKING=yes HOSTNAME=foohost.foo.com DOMAINNAME=foo.com GATEWAY=192.168.1.1 GATEWAYDEV=eth0Interface configuration varies from distribution to distribution, so your mileage may vary.
- High-Availability Storage with HA-LVM
- DNSMasq, the Pint-Sized Super Dæmon!
- March 2015 Issue of Linux Journal: System Administration
- Localhost DNS Cache
- Real-Time Rogue Wireless Access Point Detection with the Raspberry Pi
- Days Between Dates: the Counting
- The Usability of GNOME
- PostgreSQL, the NoSQL Database
- Linux for Astronomers
- You're the Boss with UBOS