Database Replication with Mysql

Until recently, I've been a fan of the PostgreSQL database, but Mysql's database replication implementation is very slick.

Not to disparage my favorite RDBMS too much, but PostgreSQL's replication mechanisms all seemed sort of kludgey. I'm not interested in creating update/delete triggers on all of my tables, for example. Mysql's mechanism is completely transparent to the client, and doesn't need any table definition changes; it's all done on the server(s).

When I refer to “database replication,” I'm not referring to a bulk file transfer. Nor am I referring to proxy mechanism that sends update/delete statements to multiple database servers. I'm referring to the ability to have a database update against one server reflected on multiple other servers without any additional intervention from the client. All this in what I would call “near real-time.” There is necessarily some propagation delay.

By implementing replication, (and in fact, migrating my entire application to Mysql to take advantage of it) I am trying to mitigate two potential threats against my application, and thus my business: a complete server or service failure, and degraded performance during routine backup cycles. In the event that my main database server dies, I can simply point my applications to the other replicated server and continue to do business. In my case, I simply have to move a DNS pointer to accomplish this task. On the other hand, I currently do database dumps late at night because of the load it places on my server. With replication, I can perform my daily backups during prime-time, on a local read-only server. My main servers need never know that they've been backed up.

So before we talk about how to design and implement Mysql replication, let's talk briefly about how it works. Essentially, Mysql uses a Master-slave model where the master keeps a log of all of the database updates that it has performed. Then, one or more slaves connect to the master, read each log entry, and perform the indicated update. The master server keeps track of housekeeping issues such as log rotation and access control. Each slave server has to maintain an idea of it's current position within the server's transaction log. As new transactions occur on the server, they get logged on the master server and downloaded by each slave. Once the transaction has been committed by each slave, the slaves update their position in the server's transaction log and wait for the next transaction. This is all done asynchronously, which means that the master server doesn't have to wait for the slaves to “catch up.” It also means that if a slave is unable to connect to the master for a period of time, it can simply download all of the pending transactions when connectivity is re-established. So far, I've found it to be pretty robust.

Even though we're talking about a master-slave model, we can actually build quite a few different server topologies to fit different needs.

The base case is where we have two servers and simply slave one to the other, which results in a master-slave configuration. In this case, we could think of the slave server as a hot standby server in case the master fails, or a server on which to run time consuming reports without affecting the master server. By adding slaves, we can implement a star topology.

If we only have two servers, we can simply configure each to be a slave to the other and we end up with a master-master configuration that works pretty well. Transactions that occur on one server are reflected on the other and vise versa. In this case, each server is completely equivalent to the other, so you could use this configuration as a load-balancing mechanism.

Variations on these themes give us the ability to build chains and even rings of replicated database servers, but this may not be as beneficial as you might think. In a chain topology for example, each server and link, adds to the data propagation time was well as additional points of failure. So while you can build some truly amazing topologies, in general, you want to keep things as simple as possible. For my particular case, I chose a master to master configuration with an additional server slaved to one of the masters. This configuration gave me a hot stand-by with automatic recovery from failure. I also gained the ability to run backups on my slave, or run my entire business from the slave in the event that the data center housing the other two servers loses connectivity. This is a lot to ask from such a simple topology.

When I first set out to configure replication on my database servers, I really expected it to be a complex procedure; it wasn't. My approach was to completely read all of the documentation at the Mysql website, ask a few questions on the Mysql replication mailing list, and begin the work. It took longer to read the documentation than it did to actually do the configuration, as it should. The two resources that I found most helpful were at:

dev.mysql.com/doc/refman/5.0/en/replication.html

and

www.howtoforge.com/mysql_database_replication

The first link is a fairly extensive document detailing all of the configuration issues and options. I recommend you read it completely before finalizing your design. The second link leads to a document that bills itself as a "copy & paste" HowTo.

Because I'm more interested in discussing some of the gotcha's, I'm only going to discuss the configuration process briefly. Fortunately, all we need to do is modify the [mysqld] section of /etc/mysql/my.cnf, issue a couple of SQL commands at the CLI, and restart Mysql. This is what I added to one of my masters:

#skip-networking
bind-address=0.0.0.0
log-bin=mysql-bin
server-id=11
innodb_flush_log_at_trx_commit=1
sync_binlog=1
auto_increment_offset=2
auto_increment_increment=5
master-host=master2.example.com
master-user=slave1
master-password=password1

All of the items above are well documented at the resources mentioned earlier, so I won't go into too much detail here. However, the auto_increment_offset and auto_increment_increment items are kind of interesting.

In a master-master configuration, it is entirely possible that an insert on the same table can occur on both servers. What happens if one of the fields in that table is an auto_increment field? The problem is that we want such a field to have a value that is unique across servers, but we don't really have a mechanism for the servers to perform this bookkeeping chore. So, we use a slick work-around; instead of adding one each time we perform an insert on a table with an auto_increment field, we add a larger number, say 5, plus an offset. Each master would then be assigned a unique offset that is less than the increment value. For example, we could use an increment of 5 and an offset of 1 and 2 for our two masters. Then when we inserted into an auto_increment field, one of our servers would assign values of 1, 6, 11, 16, 21... while the other server would use the values 2, 7, 12, 17, 22... Neither server would assign a value that the other one might assign. In this case, we'd skip the values 3,4,5,8,9,10, etc., but we'd be guaranteed that each of our values would be unique. If we then added additional master servers, we could assign them different offsets, as long as we had less than 5 servers.

The resources mentioned earlier detail the rest of the required steps, so I'll just list the steps briefly.

The next step in setting up our replication is to grant permission to the slaves to connect to the master in order to download the transaction log. Then we have to find out where the current position is in the transaction log on the master and inform each slave what this position is so that they can pick up at the current transaction. Then we restart Mysql, and to be safe, I restarted ALL of my servers, and that was all I had to do. It took me less than 30 minutes to get it done.

Once you have replication working, there are a few things you need to be aware of. The biggest issue is how to get data into your newly-replicated database. Obviously, it is MUCH easier to start with an empty database and just start adding data. Unfortunately, my databases already had data in them. There were procedures on the websites that detailed a couple of methods, depending on how much down-time you can afford. I simply dumped and dropped all of my databases, configured the replication, and restored the data on one of the masters. This worked for me, but there are better methods.

As I said earlier, PostgreSQL is my favorite RDBMS, but replication is a very compelling reason to migrate to Mysql. I've found replication to be very easy to configure and so far, it's been very stable.

______________________

Mike Diehl is a freelance Computer Nerd specializing in Linux administration, programing, and VoIP. Mike lives in Albuquerque, NM. with his wife and 3 sons. He can be reached at mdiehl@diehlnet.com

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

phpMyAdmin

Rory P's picture

Very helpful write up describing the benefits. How long on average will it take for a full replication? Any performance lags? I just recently noticed this is configured in phpMyAdmin and might offer an easier solution for people trying to set up MySQL replication.

-Rory
Domain tools

Are you sure plsql doesnt

Anonymous's picture

Are you sure plsql doesnt have such an option?

Postgres replication is just as easy to setup

nimret's picture

see:
http://www.postgresql.org/docs/8.4/static/high-availability.html

I setup 'Warm Standby Using Point-In-Time Recovery' on some psql servers a couple of companies ago and it was ez to setup and worked like a charm. Note that I am currently using mysql and am pretty impressed with it too. Both are excellent choices imo!

Nimret

Good article...

smotsie's picture

I did this about three years ago, but missed a vital step - READ all the documentation before you start. (Deadlines, no one who knows how it should work etc.) So I would agree whole-heartedly that it is a vital step in all projects.

My system didn't have the auto_increment_increment set and it took a lot of unpicking to fix it when it went wrong.

As far as I can tell, MySQL rivals the most expensive paid-for databases in it's ability to meet this particular need.

--
Smotsie
Dad.husband.linux-loving-geek.radio-presenter.eco-geek

Postgres 9 supports a similar mechanism...

Josh's picture

Though you can't do master-master with P9, it will support streaming replication and hot standby. Mike, maybe you should take a look and switch back? :)

Very useful feature

David Lane's picture

I remember we had this running in version 4 where the MySQL clusters managed the SIP Enterprise Router tables that connected the end-user with the proper call manager. We had them tiered in a two slave to one master and then the masters tiered together. It made restarting the system a bit tricky - you had to bring things up in the right order, but once they were up, it was solid, quick and responsive. Every thing should be this easy to implement!

David Lane, KG4GIY is a member of Linux Journal's Editorial Advisory Panel and the Control Op for Linux Journal's Virtual Ham Shack

White Paper
Linux Management with Red Hat Satellite: Measuring Business Impact and ROI

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to deploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success.

Learn More

Sponsored by Red Hat

White Paper
Private PaaS for the Agile Enterprise

If you already use virtualized infrastructure, you are well on your way to leveraging the power of the cloud. Virtualization offers the promise of limitless resources, but how do you manage that scalability when your DevOps team doesn’t scale? In today’s hypercompetitive markets, fast results can make a difference between leading the pack vs. obsolescence. Organizations need more benefits from cloud computing than just raw resources. They need agility, flexibility, convenience, ROI, and control.

Stackato private Platform-as-a-Service technology from ActiveState extends your private cloud infrastructure by creating a private PaaS to provide on-demand availability, flexibility, control, and ultimately, faster time-to-market for your enterprise.

Learn More

Sponsored by ActiveState