Ahead of the Pack: the Pacemaker High-Availability Stack
Corosync
Corosync's configuration files live in /etc/corosync, and the central configuration is in /etc/corosync/corosync.conf. Here's an example of the contents of this file:
totem {
# Enable node authentication & encryption
secauth: on
# Redundant ring protocol: none, active, passive.
rrp_mode: active
# Redundant communications interfaces
interface {
ringnumber: 0
bindnetaddr: 192.168.0.0
mcastaddr: 239.255.29.144
mcastport: 5405
}
interface {
ringnumber: 1
bindnetaddr: 192.168.42.0
mcastaddr: 239.255.42.0
mcastport: 5405
}
}
amf {
mode: disabled
}
service {
# Load Pacemaker
ver: 1
name: pacemaker
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
}
The important bits here are the two interface declarations enabling
redundant cluster communications and the corresponding
rrp_mode
declaration. Mutual node authentication and encryption (secauth
on) is
good security practice. And finally, the service stanza loads the
Pacemaker cluster manager as a Corosync plugin.
With secauth enabled, Corosync also requires a shared secret for
mutual node authentication. Corosync uses a simple 128-byte secret
that it stores as /etc/corosync/authkey, and which you easily
can generate with the corosync-keygen utility.
Once corosync.conf and authkey are in shape, copy them over to all
nodes in your prospective cluster. Then, fire up Corosync cluster
communications—a simple service corosync start will do.
Once the service is running on all nodes, the command
corosync-cfgtool -s will display both rings as healthy, and the
cluster is ready to communicate:
Printing ring status.
Local node ID 303938909
RING ID 0
id = 192.168.0.1
status = ring 0 active with no faults
RING ID 1
id = 192.168.42.1
status = ring 1 active with no faults
Pacemaker
Once Corosync runs, we can start Pacemaker with the service pacemaker
start command. After a few seconds, Pacemaker elects a Designated
Coordinator (DC) node among the three available nodes and commences
full cluster operations. The crm_mon utility, executable on any
cluster node, then produces output similar to this:
============
Last updated: Fri Feb 3 18:40:15 2012
Stack: openais
Current DC: bob - partition with quorum
Version: 1.1.6-4.el6-89678d4947c5bd466e2f31acd58ea4e1edb854d5
3 Nodes configured, 3 expected votes
0 Resources configured.
============
The output produced by crm_mon is a more user-friendly
representation of the internal cluster configuration and status stored
in a distributed XML database, the Cluster Information Base (CIB). Those interested and brave enough to care about the internal
representation are welcome to make use of the cibadmin
-Q
command. But be warned, before you do so, you may want to instruct the
junior sysadmin next to you to get some coffee—the
avalanche of XML gibberish that cibadmin produces can be
intimidating to the uninitiated novice.
Much less intimidating is the standard configuration facility for Pacemaker, the crm shell. This self-documenting, hierarchical, scriptable subshell is the simplest and most universal way of manipulating Pacemaker clusters. In its configure submenu, the shell allows us to load and import configuration snippets—or even complete configurations, as below:
primitive p_iscsi ocf:heartbeat:iscsi \
params portal="192.168.122.100:3260" \
target="iqn.2011-09.com.hastexo:virtcluster" \
op monitor interval="10"
primitive p_xray ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/xray.xml" \
op monitor interval="30" timeout="30" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
primitive p_yankee ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/yankee.xml" \
op monitor interval="30" timeout="30" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
primitive p_zulu ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/zulu.xml" \
op monitor interval="30" timeout="30" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
clone cl_iscsi p_iscsi
colocation c_xray_on_iscsi inf: p_xray cl_iscsi
colocation c_yankee_on_iscsi inf: p_yankee cl_iscsi
colocation c_zulu_on_iscsi inf: p_zulu cl_iscsi
order o_iscsi_before_xray inf: cl_iscsi p_xray
order o_iscsi_before_yankee inf: cl_iscsi p_yankee
order o_iscsi_before_zulu inf: cl_iscsi p_zulu
Besides defining our three virtual domains as resources under full
cluster management and monitoring (p_xray,
p_yankee and p_zulu),
this configuration also ensures that all domains can find their
storage (the cl_iscsi clone), and that they wait until iSCSI storage
becomes available (the order and
colocation constraints).
This being a single-instance storage cluster, it's imperative that we also employ safeguards against shredding our data. This is commonly known as node fencing, but Pacemaker uses the more endearing term STONITH (Shoot The Other Node In The Head) for the same concept. A ubiquitous means of node fencing is controlling nodes via their IPMI Baseboard Management Controllers, and Pacemaker supports this natively:
primitive p_ipmi_alice stonith:external/ipmi \
params hostname="alice" ipaddr="192.168.15.1" \
userid="admin" passwd="foobar" \
op start interval="0" timeout="60" \
op monitor interval="120" timeout="60"
primitive p_ipmi_bob stonith:external/ipmi \
params hostname="bob" ipaddr="192.168.15.2" \
userid="admin" passwd="foobar" \
op start interval="0" timeout="60" \
op monitor interval="120" timeout="60"
primitive p_ipmi_charlie stonith:external/ipmi \
params hostname="charlie" ipaddr="192.168.15.3" \
userid="admin" passwd="foobar" \
op start interval="0" timeout="60" \
op monitor interval="120" timeout="60"
location l_ipmi_alice p_ipmi_alice -inf: alice
location l_ipmi_bob p_ipmi_bob -inf: bob
location l_ipmi_charlie p_ipmi_charlie -inf: charlie
property stonith-enabled="true"
The three location constraints here ensure that no node has to shoot itself.
Once that configuration is active, Pacemaker fires up resources as
determined by the cluster configuration. Again, we can query the cluster
state with the crm_mon command, which now produces much more
interesting output than before:
============
Last updated: Fri Feb 3 19:46:29 2012
Stack: openais
Current DC: bob - partition with quorum
Version: 1.1.6-4.el6-89678d4947c5bd466e2f31acd58ea4e1edb854d5
3 Nodes configured, 3 expected votes
9 Resources configured.
============
Online: [ alice bob charlie ]
Clone Set: cl_iscsi [p_iscsi]
Started: [ alice bob charlie ]
p_ipmi_alice (stonith:external/ipmi): Started bob
p_ipmi_bob (stonith:external/ipmi): Started charlie
p_ipmi_charlie (stonith:external/ipmi): Started alice
p_xray (ocf::heartbeat:VirtualDomain): Started alice
p_yankee (ocf::heartbeat:VirtualDomain): Started bob
p_zulu (ocf::heartbeat:VirtualDomain): Started charlie
Note that by default, Pacemaker clusters are symmetric. The resource manager balances resources in a round-robin fashion among cluster nodes.
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Hadoop
How to Build an Optimal Hadoop Cluster to Store and Maintain Unlimited Amounts of Data Using Microservers
Realizing the promise of Apache® Hadoop® requires the effective deployment of compute, memory, storage and networking to achieve optimal results. With its flexibility and multitude of options, it is easy to over or under provision the server infrastructure, resulting in poor performance and high TCO. Join us for an in depth, technical discussion with industry experts from leading Hadoop and server companies who will provide insights into the key considerations for designing and deploying an optimal Hadoop cluster.
Some of key questions to be discussed are:
- What is the “typical” Hadoop cluster and what should be installed on the different machine types?
- Why should you consider the typical workload patterns when making your hardware decisions?
- Are all microservers created equal for Hadoop deployments?
- How do I plan for expansion if I require more compute, memory, storage or networking?
| Designing Electronics with Linux | May 22, 2013 |
| Dynamic DNS—an Object Lesson in Problem Solving | May 21, 2013 |
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
- Designing Electronics with Linux
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- Dynamic DNS—an Object Lesson in Problem Solving
- Using Salt Stack and Vagrant for Drupal Development
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Build a Skype Server for Your Home Phone System
- Why Python?
- A Topic for Discussion - Open Source Feature-Richness?
- Tech Tip: Really Simple HTTP Server with Python
- Great
3 hours 5 min ago - Reply to comment | Linux Journal
3 hours 14 min ago - Understanding the Linux Kernel
5 hours 28 min ago - General
7 hours 58 min ago - Kernel Problem
18 hours 1 min ago - BASH script to log IPs on public web server
22 hours 28 min ago - DynDNS
1 day 2 hours ago - Reply to comment | Linux Journal
1 day 2 hours ago - All the articles you talked
1 day 4 hours ago - All the articles you talked
1 day 5 hours ago



Comments
Its said changes charge to be
Its said changes charge to be fabricated because of its brokenness. Boloney! Get aback in the MB star C3 drivers bench and acquisition a distro that puts you aback abaft the caster and not these distributions that accept put aqueduct
Interesting
This one is quite an interesting read. Thanks for sharing.
HA Documentation
Where can we find a more detail documentation that easy to undertand about Linux HA beside the link that you have mention above, especially about Corosync and Pacemaker.
The two documentation "Cluster from Scratch" dan "Pacemaker Explained" was too short and directly jump start to the configuration without any explanation about how those two technology works and step by step configuration.
If there is any documentation like http://www.drbd.org/home/what-is-drbd/ that explained DRBD in an easy way it will be a great help to understand about PaceMaker and Corosync.
It seems that very few good documentation about Linux HA available everywhere.
Reply to comment | Linux Journal
Hello there, just became aware of your blog through Google,
and found that it's really informative. I am gonna watch out for brussels. I'll appreciate if you continue this in future. Many people will be benefited from your writing. Cheers!