DRBD in a Heartbeat

How to build a redundant, high-availability system with DRBD and Heartbeat.
Configuring DRBD

After you've done that, you have to set up DRBD before moving forward with Heartbeat. In my setup, the configuration file is /etc/drbd.conf, but that can change depending on distribution and compile time options, so try to find the file and open it now so you can follow along. If you can't find it, simply create one called /etc/drbd.conf.

Listing 1 is my configuration file. I go over it line by line and add explanations as comments that begin with the # character.

Now, let's test it by starting the DRBD driver to see if everything works as it should. On your command line on both servers type:

drbdadm create-md drbd0; /etc/init.d/drbd restart; cat /proc/drbd

If all goes well, the output of the last command should look something like this:

0: cs:Connected st:Secondary/Secondary ds:Inconsistent/Inconsistent r---
   ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
       resync: used:0/7 hits:0 misses:0 starving:0 dirty:0 changed:0
       act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

Note: you always can find information about the DRBD status by typing:

cat /proc/drbd

Now, type the following command on the master system:

drbdadm -- --overwrite-data-of-peer primary drbd0; cat /proc/drbd

The output should look something like this:

0: cs:SyncSource st:Primary/Secondary ds:UpToDate/Inconsistent r---
   ns:65216 nr:0 dw:0 dr:65408 al:0 bm:3 lo:0 pe:7 ua:6 ap:0
       [>...................] sync'ed:  2.3% (3083548/3148572)K
       finish: 0:04:43 speed: 10,836 (10,836) K/sec
       resync: used:1/7 hits:4072 misses:4 starving:0 dirty:0 changed:4
       act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0

This means it is syncing your disks from the master computer that is set as the primary one to the slave computer that is set as secondary.

Next, create the filesystem by typing the following on the master system:

mkfs.ext3 /dev/drbd0

Once that is done, on the master computer, go ahead and mount the drive /dev/drbd0 on the /replicated directory we created for it. We'll have to mount it manually for now until we set up Heartbeat.

Preparing Your Services

An important part of any redundant solution is properly preparing your services so that when the master machine fails, the slave machine can take over and run those services seamlessly. To do that, you have to move not only the data to the replicated DRBD disk, but also move the configuration files.

Let me show you how I've got Sendmail set up to handle the mail and store it on the replicated drives. I use Sendmail for this example as it is one step more complicated than the other services, because even if the machine is running in slave mode, it may need to send e-mail notifications from internal applications, and if Sendmail can't access the configuration files, it won't be able to do this.

On the master machine, first make sure Sendmail is installed but stopped. Then create an etc directory on your /replicated drive. After that, copy your /etc/mail directory into the /replicated/etc and create a symlink from /replicated/etc/mail to /etc/mail.

Next, make a var directory on the /replicated drive, and copy /var/mail, /var/spool/mqueue and any other mail data folders into that directory. Then, of course, create the appropriate symlinks so that the new folders are accessible from their previous locations.

Your /replicated directory structure should now look something like:

/replicated/etc/mail
/replicated/var/mail
/replicated/var/spool/mqueue
/replicated/var/spool/mqueue-client
/replicated/var/spool/mail

And, on your main drive, those folders should be symlinks and look something like:

/etc/mail -> /replicated/etc/mail
/var/mail -> /replicated/var/mail
/var/spool/mqueue -> /replicated/var/spool/mqueue
/var/spool/mqueue-client -> /replicated/var/spool/mqueue-client
/var/spool/mail -> /replicated/var/spool/mail

Now, start Sendmail again and give it a try. If all is working well, you've successfully finished the first part of the setup.

The next part is to make sure it runs, even on the slave. The trick we use is copying the Sendmail binary onto the mounted /replicated drive and putting a symlink to the binary ssmtp on the unmounted /replicated folder.

First, make sure you have ssmtp installed and configured on your system. Next, make a directory /replicated/usr/sbin, and copy /usr/sbin/sendmail to that directory. Then, symlink from /usr/sbin/sendmail back to /replicated/usr/sbin/sendmail.

Once that's done, shut down Sendmail and unmount the /replicated drive. Then, on both the master and slave computers, create a folder /replicated/usr/sbin and a symlink from /usr/sbin/ssmtp to /replicated/usr/sbin/sendmail.

After setting up Sendmail, setting up other services like Apache and PostgreSQL will seem like a breeze. Just remember to put all their data and configuration files on the /replicated drive and to create the appropriate symlinks.

______________________

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thanks

Kokai's picture

Thank You, check http://docs.homelinux.org for other tutorials about drbd. It's also good explained like Your articel.

DRBD.conf

Anonymous's picture

Hye.!

IP of ur server1 is 192.168.1.1 instead of 192.168.1.3

is'nt it .

is it (192.168.1.3) taken here by some mistake or something else...

;-}

drbd after failure

Daniel's picture

I have done the above and setup drbd and heartbeat. I am having an issue where once say node-a looses it's network connection the failover happens as expected, node-b mounts the disk and everything is there; but when node-a comes backup it drbd is not being started as a slave and it is taking back over as primary and I loose all my new data that node-b created.

how can I get drbd to play nicer?

====DRBD.conf=====

global {
minor-count 1;
}

resource mysql {

# * for critical transactional data.
protocol C;

on server-1 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.0.128:7788;
meta-disk internal;
}

on server-2 {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.0.129:7788;
meta-disk internal;
}

disk {
on-io-error detach;
}

net {
max-buffers 2048;
ko-count 4;
}

syncer {
rate 10M;
al-extents 257;
}

startup {
wfc-timeout 30;
degr-wfc-timeout 120;
}
}

=====END======

====ha.cf=====

logfacility local0
keepalive 500ms
deadtime 10
warntime 5
initdead 30
ucast eth0 192.168.0.129
#mcast eth0 225.0.0.1 694 2 0
auto_failback off
node server-1
node server-2
respawn hacluster /usr/lib/heartbeat/ipfail
use_logd yes
logfile /var/log/hb.log
debugfile /var/log/heartbeat-debug.log

====END=======

Is the an error in Listing 1 (drbd.conf)?

Laker Netman's picture

I'm in the midst of setting up my first HA cluster and want to be sure I didn't miss something. Shouldn't the address line in the on server section be 192.168.1.1 rather than .1.3? If not, what did I miss?

TIA,
Laker

mounting drbd drives with heartbeat

Chris's picture

I can get drbd drive to startup with heartbeat, I can fail and have the primary and secondary change. The problem I am having is that I can not get heartbeat to mount the drives, I can mount them just fine with the mount command.

I am not sure in the article how the drives are being mounted.

Does anyone know how I would mount the /dev/drbd0 to /mnt/drbd0 with heartbeat?

mount drives with heartbeat

Joseph Chackungal's picture

I have the same issue!

Did you find a way out? My HA-Cluster with DRBD works great if I manually mount my replicated drive. But it refuses to do automatically with heartbeat.

Any help/leads will be appreciated.

Thanks

Try mounting with resource

Jan's picture

Try mounting with resource script Filesystem. This works for me and is mentioned in some other how to articles...

Filesystem::/dev/drdb0::/data::ext3
Try on commandline w/o the ::

NFS support is generally quite tricky

frankie's picture

That should be at least cited in the article. Heartbeat based service with a floating IP address could be extremely tricky for locking, when used with NFS servers. Also, I would avoid to use mbox based mail spool dirs on drdb partitions, maildirs are much more safe. That's my 2 cents.

NFS support not that difficult

Alan Robertson's picture

NFS works well - including locking. Dozens to hundreds of sites have it working quite nicely. You do have to set things up correctly, but the Linux-HA web site and several other articles (pointed to by the PressRoom page) explain how to do that in detail. There is a hole where an extremely active application can possibly fail to get a lock during a failover, but it's happens rarely.

However, you do have to set it up correctly.

Webinar
One Click, Universal Protection: Implementing Centralized Security Policies on Linux Systems

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Learn More

Sponsored by Bit9

Webinar
Linux Backup and Recovery Webinar

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.

Learn More

Sponsored by Storix