Category Image Monitoring and Resetting MySQL Replication


OK, so you have a nice replication setup, but how do you know it is actually working, and what do you do when it stops? This short article shows how to check and quickly fix replication that has stopped. This procedure takes 2 minutes and can be done remotely on the command line.

To check if replication is working, log into the slave and execute:
> SHOW SLAVE STATUS;

The result is something like this:
(root@slmini.local) (none)> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: master.domain.net
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binary-log.000033
Read_Master_Log_Pos: 189628335
Relay_Log_File: localhost-relay-bin.000002
Relay_Log_Pos: 4
Relay_Master_Log_File: binary-log.000033
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 189628335
Relay_Log_Space: 4
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: Yes
Master_SSL_CA_File: /etc/mysql/ssl/ca_self.crt
Master_SSL_CA_Path:
Master_SSL_Cert: /etc/mysql/ssl/client-cert.pem
Master_SSL_Cipher:
Master_SSL_Key: /etc/mysql/ssl/client-key.pem
Seconds_Behind_Master: NULL

If either of these is NO, like this, then replication is stopped:
Slave_IO_Running: No
Slave_SQL_Running: No

The Last_Errno and Last_Error might give you a clue as to what went wrong.

If all seems OK, you can also confirm further that everything is working by logging into the master and executing SHOW MASTER STATUS and comparing the binary log and exec position.

If a recovery is required, often, you can do a quick recovery by seeing the point at which the slave stopped and then simply resetting and restarting the slave at that point in the master binary logs. If this quick procedure fails, then you will have to perform the more time-consuming full copy from the master and restart replication like you did when you initially set it up.

Quick Reset Procedure

1) First, issue a STOP SLAVE
> STOP SLAVE;

2) Important: Next, issue a SHOW SLAVE STATUS and get the stopping point information
> SHOW SLAVE STATUS;

At this stage you must make note of the result of the SHOW SLAVE STATUS. If you don't have this info on hand, you will not be able to complete the procedure. Usually I am using a terminal program and remotely accessing the server, so I always copy the result from the screen and paste it into a text editor on my machine.

The information we need from that result is as follows:
Master_Host: master.domain.net
Master_User: repl
Master_Port: 3306
Master_Log_File: binary-log.000033
Exec_Master_Log_Pos: 189628335

... and these are optional if using SSL .....
Master_SSL_Allowed: Yes
Master_SSL_CA_File: /etc/mysql/ssl/ca_self.crt
Master_SSL_Cert: /etc/mysql/ssl/client-cert.pem
Master_SSL_Key: /etc/mysql/ssl/client-key.pem

3) Next, issue a RESET SLAVE:
> RESET SLAVE;

4) Now we issue a CHANGE MASTER command, for example (substituting your own values of course):
> change master to
master_host='master.domain.net',
master_user='repl',
master_password='thereplpassword',
master_port=3306,
master_log_file='binary-log.000033',
master_log_pos=189628335,
master_ssl=1,
master_ssl_ca='/etc/mysql/ssl/ca_self.crt',
master_ssl_cert='/etc/mysql/ssl/client-cert.pem',
master_ssl_key='/etc/mysql/ssl/client-key.pem';

The last 4 master_ssl parameters are not required if not replicating over SSL.

5) Finally, start the slave:
> START SLAVE;

And check again with SHOW SLAVE STATUS to make sure we are replicating again.

Note: If you are getting repeated situations where replication is getting errors and stopping, then you need to reassess your setup. It is VERY important to have BOTH master and slave on Uninterruptable Power Supplies if that is not obvious! If you have recovered and still get errors, then a full recovery by getting a full dump from master and a scratch slave setup is necessary.

Automated Monitoring
You can of course write some scripts to perform the slave running check every 5 minutes and email you if it has had an error and stopped replicating. Jeremy Zawodny in his book discusses ways to automate slave replication checking and alert you when replication has stopped or fallen too far behind.


Posted: Tuesday - July 17, 2007 at 10:51 AM        


Published by