Ahead of the Pack: the Pacemaker High-Availability Stack
Corosync
Corosync's configuration files live in /etc/corosync, and the central configuration is in /etc/corosync/corosync.conf. Here's an example of the contents of this file:
totem {
# Enable node authentication & encryption
secauth: on
# Redundant ring protocol: none, active, passive.
rrp_mode: active
# Redundant communications interfaces
interface {
ringnumber: 0
bindnetaddr: 192.168.0.0
mcastaddr: 239.255.29.144
mcastport: 5405
}
interface {
ringnumber: 1
bindnetaddr: 192.168.42.0
mcastaddr: 239.255.42.0
mcastport: 5405
}
}
amf {
mode: disabled
}
service {
# Load Pacemaker
ver: 1
name: pacemaker
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
}
The important bits here are the two interface declarations enabling
redundant cluster communications and the corresponding
rrp_mode
declaration. Mutual node authentication and encryption (secauth
on) is
good security practice. And finally, the service stanza loads the
Pacemaker cluster manager as a Corosync plugin.
With secauth enabled, Corosync also requires a shared secret for
mutual node authentication. Corosync uses a simple 128-byte secret
that it stores as /etc/corosync/authkey, and which you easily
can generate with the corosync-keygen utility.
Once corosync.conf and authkey are in shape, copy them over to all
nodes in your prospective cluster. Then, fire up Corosync cluster
communications—a simple service corosync start will do.
Once the service is running on all nodes, the command
corosync-cfgtool -s will display both rings as healthy, and the
cluster is ready to communicate:
Printing ring status.
Local node ID 303938909
RING ID 0
id = 192.168.0.1
status = ring 0 active with no faults
RING ID 1
id = 192.168.42.1
status = ring 1 active with no faults
Pacemaker
Once Corosync runs, we can start Pacemaker with the service pacemaker
start command. After a few seconds, Pacemaker elects a Designated
Coordinator (DC) node among the three available nodes and commences
full cluster operations. The crm_mon utility, executable on any
cluster node, then produces output similar to this:
============
Last updated: Fri Feb 3 18:40:15 2012
Stack: openais
Current DC: bob - partition with quorum
Version: 1.1.6-4.el6-89678d4947c5bd466e2f31acd58ea4e1edb854d5
3 Nodes configured, 3 expected votes
0 Resources configured.
============
The output produced by crm_mon is a more user-friendly
representation of the internal cluster configuration and status stored
in a distributed XML database, the Cluster Information Base (CIB). Those interested and brave enough to care about the internal
representation are welcome to make use of the cibadmin
-Q
command. But be warned, before you do so, you may want to instruct the
junior sysadmin next to you to get some coffee—the
avalanche of XML gibberish that cibadmin produces can be
intimidating to the uninitiated novice.
Much less intimidating is the standard configuration facility for Pacemaker, the crm shell. This self-documenting, hierarchical, scriptable subshell is the simplest and most universal way of manipulating Pacemaker clusters. In its configure submenu, the shell allows us to load and import configuration snippets—or even complete configurations, as below:
primitive p_iscsi ocf:heartbeat:iscsi \
params portal="192.168.122.100:3260" \
target="iqn.2011-09.com.hastexo:virtcluster" \
op monitor interval="10"
primitive p_xray ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/xray.xml" \
op monitor interval="30" timeout="30" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
primitive p_yankee ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/yankee.xml" \
op monitor interval="30" timeout="30" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
primitive p_zulu ocf:heartbeat:VirtualDomain \
params config="/etc/libvirt/qemu/zulu.xml" \
op monitor interval="30" timeout="30" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
clone cl_iscsi p_iscsi
colocation c_xray_on_iscsi inf: p_xray cl_iscsi
colocation c_yankee_on_iscsi inf: p_yankee cl_iscsi
colocation c_zulu_on_iscsi inf: p_zulu cl_iscsi
order o_iscsi_before_xray inf: cl_iscsi p_xray
order o_iscsi_before_yankee inf: cl_iscsi p_yankee
order o_iscsi_before_zulu inf: cl_iscsi p_zulu
Besides defining our three virtual domains as resources under full
cluster management and monitoring (p_xray,
p_yankee and p_zulu),
this configuration also ensures that all domains can find their
storage (the cl_iscsi clone), and that they wait until iSCSI storage
becomes available (the order and
colocation constraints).
This being a single-instance storage cluster, it's imperative that we also employ safeguards against shredding our data. This is commonly known as node fencing, but Pacemaker uses the more endearing term STONITH (Shoot The Other Node In The Head) for the same concept. A ubiquitous means of node fencing is controlling nodes via their IPMI Baseboard Management Controllers, and Pacemaker supports this natively:
primitive p_ipmi_alice stonith:external/ipmi \
params hostname="alice" ipaddr="192.168.15.1" \
userid="admin" passwd="foobar" \
op start interval="0" timeout="60" \
op monitor interval="120" timeout="60"
primitive p_ipmi_bob stonith:external/ipmi \
params hostname="bob" ipaddr="192.168.15.2" \
userid="admin" passwd="foobar" \
op start interval="0" timeout="60" \
op monitor interval="120" timeout="60"
primitive p_ipmi_charlie stonith:external/ipmi \
params hostname="charlie" ipaddr="192.168.15.3" \
userid="admin" passwd="foobar" \
op start interval="0" timeout="60" \
op monitor interval="120" timeout="60"
location l_ipmi_alice p_ipmi_alice -inf: alice
location l_ipmi_bob p_ipmi_bob -inf: bob
location l_ipmi_charlie p_ipmi_charlie -inf: charlie
property stonith-enabled="true"
The three location constraints here ensure that no node has to shoot itself.
Once that configuration is active, Pacemaker fires up resources as
determined by the cluster configuration. Again, we can query the cluster
state with the crm_mon command, which now produces much more
interesting output than before:
============
Last updated: Fri Feb 3 19:46:29 2012
Stack: openais
Current DC: bob - partition with quorum
Version: 1.1.6-4.el6-89678d4947c5bd466e2f31acd58ea4e1edb854d5
3 Nodes configured, 3 expected votes
9 Resources configured.
============
Online: [ alice bob charlie ]
Clone Set: cl_iscsi [p_iscsi]
Started: [ alice bob charlie ]
p_ipmi_alice (stonith:external/ipmi): Started bob
p_ipmi_bob (stonith:external/ipmi): Started charlie
p_ipmi_charlie (stonith:external/ipmi): Started alice
p_xray (ocf::heartbeat:VirtualDomain): Started alice
p_yankee (ocf::heartbeat:VirtualDomain): Started bob
p_zulu (ocf::heartbeat:VirtualDomain): Started charlie
Note that by default, Pacemaker clusters are symmetric. The resource manager balances resources in a round-robin fashion among cluster nodes.
Today’s modular x86 servers are compute-centric, designed as a least common denominator to support a wide range of IT workloads. Those generic, virtualized IT workloads have much different resource optimization requirements than hyperscale and cloud applications. They have resulted in a “one size fits all” enterprise IT architecture that is not optimized for a specific set of IT workloads, and especially not emerging hyperscale workloads, such as web applications, big data, and object storage. In this report, you will learn how shifting the focus from traditional compute-centric IT architectures to an innovative disaggregated fabric-based architecture can optimize and scale your data center.
Sponsored by AMD
Built-in forensics, incident response, and security with Red Hat Enterprise Linux 6
Every security policy provides guidance and requirements for ensuring adequate protection of information and data, as well as high-level technical and administrative security requirements for a system in a given environment. Traditionally, providing security for a system focuses on the confidentiality of the information on it. However, protecting the data integrity and system and data availability is just as important. For example, when processing United States intelligence information, there are three attributes that require protection: confidentiality, integrity, and availability.
Learn more about catching the bad guy in this free white paper.
Sponsored by DLT Solutions
Free Webinar: Linux Backup and Recovery
Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.
In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.
| Using Salt Stack and Vagrant for Drupal Development | May 20, 2013 |
| Making Linux and Android Get Along (It's Not as Hard as It Sounds) | May 16, 2013 |
| Drupal Is a Framework: Why Everyone Needs to Understand This | May 15, 2013 |
| Home, My Backup Data Center | May 13, 2013 |
| Non-Linux FOSS: Seashore | May 10, 2013 |
| Trying to Tame the Tablet | May 08, 2013 |
- Using Salt Stack and Vagrant for Drupal Development
- Making Linux and Android Get Along (It's Not as Hard as It Sounds)
- New Products
- Validate an E-Mail Address with PHP, the Right Way
- Drupal Is a Framework: Why Everyone Needs to Understand This
- A Topic for Discussion - Open Source Feature-Richness?
- The Pari Package On Linux
- New Products
- New Products
- Home, My Backup Data Center
- This is the easiest tutorial
3 hours 39 min ago - Ahh, the Koolaid.
9 hours 18 min ago - git-annex assistant
15 hours 17 min ago - direct cable connection
15 hours 40 min ago - Agreed on AirDroid. With my
15 hours 50 min ago - I just learned this
15 hours 54 min ago - enterprise
16 hours 24 min ago - not living upto the mobile revolution
19 hours 15 min ago - Deceptive Advertising and
19 hours 51 min ago - Let\'s declare that you have
19 hours 52 min ago



Comments
Its said changes charge to be
Its said changes charge to be fabricated because of its brokenness. Boloney! Get aback in the MB star C3 drivers bench and acquisition a distro that puts you aback abaft the caster and not these distributions that accept put aqueduct
Interesting
This one is quite an interesting read. Thanks for sharing.
HA Documentation
Where can we find a more detail documentation that easy to undertand about Linux HA beside the link that you have mention above, especially about Corosync and Pacemaker.
The two documentation "Cluster from Scratch" dan "Pacemaker Explained" was too short and directly jump start to the configuration without any explanation about how those two technology works and step by step configuration.
If there is any documentation like http://www.drbd.org/home/what-is-drbd/ that explained DRBD in an easy way it will be a great help to understand about PaceMaker and Corosync.
It seems that very few good documentation about Linux HA available everywhere.
Reply to comment | Linux Journal
Hello there, just became aware of your blog through Google,
and found that it's really informative. I am gonna watch out for brussels. I'll appreciate if you continue this in future. Many people will be benefited from your writing. Cheers!