Expanding Options for Clustering
Ten years ago, clustering meant one thing—grouping a small number of large servers together with shared disks to host on-line transaction processing applications with better availability and, in some cases, scalability. Since then, we've seen the rise of internet-based applications and thin commodity servers, and a whole new model of open, scalable, distributed computing built upon Linux.
Many vendors in the Linux space are working on reproducing the kind of clustering that existed ten years ago on traditional UNIX OLTP systems. This may not be the right strategy. Rather, clustering should look toward the types of applications and architectures that are being deployed in the internet data center and on sets of thin Linux servers. In the internet data center—whether in a hosting environment or in a corporation—application architectures have several characteristics that are fundamentally different from the architectures of the old glass house. In particular, the internet data center architectures share some important common themes:
Architectures are multitiered (usually web servers, application servers, and database servers).
Each tier may have literally dozens of servers.
Each tier has unique requirements for shared and replicated data.
In a large e-business or web content site, managing the entire distributed set of resources that provide a service over the Web is a bigger challenge than simply providing failover for individual servers.
While some applications, like OLTP or some kinds of e-commerce failover, still require shared disk solutions, modern applications, in many cases, can take advantage of software-based replication provided by middleware. What they lack is the infrastructure to bind those solutions together and integrate them in the larger computing environment. Application management—including failover—remains one of the key problems facing today's operations manager, but the options are, fortunately, getting better.
The applications that do require shared access to data now often involve dozens of machines, potentially in multiple Points of Presence (POPs). This means that even the shared disk solutions of traditional clustering, which are based on physically shared SCSI connections, may not be adequate. Those solutions are bound by physical cabling and SCSI address limitations and usually can only service two or four systems.
The web server tier, a stronghold of Linux, is characterized by either read-only or dynamically created data. This data can be, and often is, shared across multiple servers by using network-attached storage appliances running NFS. Each server mounts the NFS appliance and has access to the right data. For this tier, complicated shared SCSI cabling schemes are totally inappropriate—a TCP/IP network provides all the interconnect between servers and storage. However, clustering software that can monitor the health and performance of web servers, and take appropriate actions based on the input from those monitors, is still very important.
In the traditional corporate glass house, applications were typically client/server and were based on a relational database. For the most part, data was not stored in the application layer—everything was put in the database. The mix is much richer in the Internet/Linux world. The emergence of Java and Enterprise JavaBeans has led developers to serialize application-layer objects in the file system, outside the database. In addition, applications now manipulate a much richer universe of data—including digital streaming media, text and other data types that are not naturally stored in SQL back-ends.
For this reason, the applications layer has shared data and failover requirements that will be met by neither network-attached storager or shared SCSI. Network-attached storage typically does not work because NFS protocols do not provide enough consistency guarantees to be suitable where multiple servers may be writing the same data. Shared SCSI is not sufficient because replication nor sharing may need to happen across many systems or across multiple POPs. In many cases, therefore, applications provide their own replication. For example, consider a foreign-exchange trading application in use at several dozen large banks. Replication of real-time trading data is maintained at the application level, combined with cluster-based failover.
The database layer is the domain of traditional clustering solutions from the big UNIX vendors—Sun, HP and IBM. Originally, two servers would be connected to some SCSI disks, with one server being an active master and the other being a passive standby. In more modern implementations, multiple servers share a fiber channel or SCSI storage system and use a parallel database like Oracle Parallel Server (OPS) to manage concurrent access by each node in the cluster.
However, high-end fiber channel SANs are expensive, as is software like Oracle Parallel Server. Often, the price/performance of these types of solutions are not appropriate for the thin server Linux market or for the internet data center. The active/passive SCSI approach is also wasteful of resources and suffers limited scalability. Fortunately, there are other approaches more suited to the kinds of applications being run on Linux.
For example, database vendors have put much effort into shared-nothing software replication approaches (Informix Replicator, DB/2 Replication, Oracle Multi-master and hot standby). Using these tools, it is possible for multiple servers to participate at the database tier without having shared storage. However, the databases themselves often lack the facilities to detect failure, sequence the actions to initiate failover and guarantee application integrity. For this reason, they are best paired with a clustering solution that provides these services.
A real-life example is a chip fabrication facility running a manufacturing automation application on Intel-based thin servers. Because of the value of this application, the chip manufacturer requires three-way replication, so that even if the primary fails, there are two machines ready to take over. The solution chosen was to use Informix database-replication services running on physically separated machines without shared storage in conjunction with a cluster management engine.
|Designing Electronics with Linux||May 22, 2013|
|Dynamic DNS—an Object Lesson in Problem Solving||May 21, 2013|
|Using Salt Stack and Vagrant for Drupal Development||May 20, 2013|
|Making Linux and Android Get Along (It's Not as Hard as It Sounds)||May 16, 2013|
|Drupal Is a Framework: Why Everyone Needs to Understand This||May 15, 2013|
|Home, My Backup Data Center||May 13, 2013|
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Validate an E-Mail Address with PHP, the Right Way
- What's the tweeting protocol?
- Mediated Reality: University of Toronto RWM Project
- New Products
- Using Salt Stack and Vagrant for Drupal Development
- Dart: a New Web Programming Experience
- OpenOffice.org Off-the-Wall: ToCs, Indexes and Bibliographies in OOo Writer
53 min 33 sec ago
- Kernel Problem
10 hours 56 min ago
- BASH script to log IPs on public web server
15 hours 23 min ago
18 hours 59 min ago
- Reply to comment | Linux Journal
19 hours 31 min ago
- All the articles you talked
21 hours 55 min ago
- All the articles you talked
21 hours 58 min ago
- All the articles you talked
21 hours 59 min ago
1 day 2 hours ago
- Keeping track of IP address
1 day 4 hours ago
Enter to Win an Adafruit Pi Cobbler Breakout Kit for Raspberry Pi
It's Raspberry Pi month at Linux Journal. Each week in May, Adafruit will be giving away a Pi-related prize to a lucky, randomly drawn LJ reader. Winners will be announced weekly.
Fill out the fields below to enter to win this week's prize-- a Pi Cobbler Breakout Kit for Raspberry Pi.
Congratulations to our winners so far:
- 5-8-13, Pi Starter Pack: Jack Davis
- 5-15-13, Pi Model B 512MB RAM: Patrick Dunn
- 5-21-13, Prototyping Pi Plate Kit: Philip Kirby
- Next winner announced on 5-27-13!
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?