Red Hat Enterprise Linux Cluster Suite

Building a highly available solution using the RHEL cluster suite.

When mission-critical applications fail, so does your business. This often is a true statement in today's environments, where most organizations spend millions of dollars making their services available 24/7, 365 days a year. Organizations, regardless of whether they are serving external customers or internal customers, are deploying highly available solutions to make their applications highly available.

In view of this growing demand, almost every IT vendor currently is providing high-availability solutions for its specific platform. Famous commercial high-availability solutions include IBM's HACMP, Veritas' Cluster Server and HP's Serviceguard.

If you're looking for a commercial high-availability solution on Red Hat Enterprise Linux, the best choice probably is the Red Hat Cluster Suite.

In early 2002, Red Hat introduced the first member of its Red Hat Enterprise Linux family of products, Red Hat Enterprise Linux AS (originally called Red Hat Linux Advanced Server). Since then, the family of products has grown steadily, and it now includes Red Hat Enterprise Linux ES (for entry- and mid-range servers) and Red Hat Enterprise Linux WS (for desktops/workstations). These products are designed specifically for use in enterprise environments to deliver superior application support, performance, availability and scalability.

The original release of Red Hat Enterprise Linux AS version 2.1 included a high-availability clustering feature as part of the base product. This feature was not included in the smaller Red Hat Enterprise Linux ES product. However, with the success of the Red Hat Enterprise Linux family, it became clear that high-availability clustering was a feature that should be made available for both AS and ES server products. Consequently, with the release of Red Hat Enterprise Linux version 3 in October 2003, the high-availability clustering feature was packaged into an optional layered product called the Red Hat Cluster Suite, and it was certified for use on both the Enterprise Linux AS and Enterprise Linux ES products.

The RHEL cluster suite is a separately licensed product and can be purchased from Red Hat on top of Red Hat's base ES Linux license.

Red Hat Cluster Suite Overview

The Red Hat Cluster Suite has two major features. One is the Cluster Manager that provides high availability, and the other feature is called IP load balancing (originally called Piranha). The Cluster Manager and IP load balancing are complementary high-availability technologies that can be used separately or in combination, depending on application requirements. Both of these technologies are integrated in Red Hat's Cluster Suite. In this article, I focus on the Cluster Manager.

Table 1 shows the major components of the RHEL Cluster Manager.

Table 1. RHEL Cluster Manager Components

Software SubsystemComponentPurpose
FencefencedProvides fencing infrastructure for specific hardware platforms.
DLMlibdlm, dlm-kernelContains distributed lock management (DLM) library.
CMANcmanContains the Cluster Manager (CMAN), which is used for managing cluster membership, messaging and notification.
GFS and related locksLock_NoLockContains shared filesystem support that can be mounted on multiple nodes concurrently.
GULMgulmContains the GULM lock management user-space tools and libraries (an alternative to using CMAN and DLM).
Rgmanagerclurgmgrd, clustatManages cluster services and resources.
CCSccsd, ccs_test and ccs_toolContains the cluster configuration services dæmon (ccsd) and associated files.
Cluster Configuration ToolSystem-config-clusterContains the Cluster Configuration Tool, used to configure the cluster and display the current status of the nodes, resources, fencing agents and cluster services graphically.
Magmamagma and magma-pluginsContains an interface library for cluster lock management and required plugins.
IDDEViddevContains the libraries used to identify the filesystem (or volume manager) in which a device is formatted.
Shared Storage and Data Integrity

Lock management is a common cluster infrastructure service that provides a mechanism for other cluster infrastructure components to synchronize their access to shared resources. In a Red Hat cluster, DLM (Distributed Lock Manager) or, alternatively, GULM (Grand Unified Lock Manager) are possible lock manager choices. GULM is a server-based unified cluster/lock manager for GFS, GNBD and CLVM. It can be used in place of CMAN and DLM. A single GULM server can be run in standalone mode but introduces a single point of failure for GFS. Three or five GULM servers also can be run together, in which case the failure of one or two servers can be tolerated, respectively. GULM servers usually are run on dedicated machines, although this is not a strict requirement.

In my cluster implementation, I used DLM, and it runs in each cluster node. DLM is good choice for small clusters (up to two nodes), because it removes quorum requirements as imposed by the GULM mechanism).

Based on DLM or GLM locking functionality, there are two basic techniques that can be used by the RHEL cluster for ensuring data integrity in concurrent access environments. The traditional way is the use of CLVM, which works well in most RHEL cluster implementations with LVM-based logical volumes.

Another technique is GFS. GFS is a cluster filesystem that allows a cluster of nodes to access simultaneously a block device that is shared among the nodes. It employs distributed metadata and multiple journals for optimal operation in a cluster. To maintain filesystem integrity, GFS uses a lock manager (DLM or GULM) to coordinate I/O. When one node changes data on a GFS filesystem, that change is visible immediately to the other cluster nodes using that filesystem.

Hence, when you are implementing a RHEL cluster with concurrent data access requirements (such as, in the case of an Oracle RAC implementation), you can use either GFS or CLVM. In most Red Hat cluster implementations, GFS is used with a direct access configuration to shared SAN from all cluster nodes. However, for the same purpose, you also can deploy GFS in a cluster that is connected to a LAN with servers that use GNBD (Global Network Block Device) or two iSCSI (Internet Small Computer System Interface) devices.

Both GFS and CLVM use locks from the lock manager. However, GFS uses locks from the lock manager to synchronize access to filesystem metadata (on shared storage), while CLVM uses locks from the lock manager to synchronize updates to LVM volumes and volume groups (also on shared storage).

For nonconcurrent RHEL cluster implementations, you can rely on CLVM, or you can use native RHEL journaling-based techniques (such as ext2 and ext3). For nonconcurrent access clusters, data integrity issues are minimal; I tried to keep my cluster implementations simple by using native RHEL OS techniques.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

RHEL cluster suite

Anonymous's picture

Thanks mate, really good information regarding RHEL cluster suite. i really impressed by your knowledge and expertise. Also see this i am qualified of PMI-001 and i am looking for some Red hat Certifications now.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix