Monitoring and Resetting MySQL Replication
OK, so you have a nice replication setup, but how
do you know it is actually working, and what do you do when it stops? This short
article shows how to check and quickly fix replication that has stopped. This
procedure takes 2 minutes and can be done remotely on the command
line.
To check if replication is working, log into the
slave and execute:> SHOW SLAVE
STATUS;The result is something like
this:(root@slmini.local) (none)>
show slave
status\G*************************** 1.
row ***************************
Slave_IO_State:
Master_Host: master.domain.net
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: binary-log.000033
Read_Master_Log_Pos: 189628335
Relay_Log_File:
localhost-relay-bin.000002
Relay_Log_Pos: 4
Relay_Master_Log_File:
binary-log.000033
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
0 Last_Error:
Skip_Counter:
0 Exec_Master_Log_Pos:
189628335 Relay_Log_Space:
4 Until_Condition:
None Until_Log_File:
Until_Log_Pos:
0 Master_SSL_Allowed:
Yes Master_SSL_CA_File:
/etc/mysql/ssl/ca_self.crt
Master_SSL_CA_Path:
Master_SSL_Cert:
/etc/mysql/ssl/client-cert.pem
Master_SSL_Cipher:
Master_SSL_Key:
/etc/mysql/ssl/client-key.pem
Seconds_Behind_Master: NULLIf
either of these is NO, like this, then replication is
stopped: Slave_IO_Running:
No Slave_SQL_Running:
NoThe Last_Errno and Last_Error
might give you a clue as to what went
wrong.If all seems OK, you can also
confirm further that everything is working by logging into the master and
executing SHOW MASTER STATUS and comparing the binary log and exec
position.If a recovery is required,
often, you can do a quick recovery by seeing the point at which the slave
stopped and then simply resetting and restarting the slave at that point in the
master binary logs. If this quick procedure fails, then you will have to perform
the more time-consuming full copy from the master and restart replication like
you did when you initially set it
up.Quick Reset
Procedure1) First, issue a STOP
SLAVE> STOP
SLAVE;2) Important: Next, issue a
SHOW SLAVE STATUS and get the stopping point
information> SHOW SLAVE
STATUS;At this stage you must
make note of the result of the SHOW SLAVE STATUS. If you don't have this info on
hand, you will not be able to complete the procedure. Usually I am using a
terminal program and remotely accessing the server, so I always copy the result
from the screen and paste it into a text editor on my
machine.The information we need from
that result is as follows:
Master_Host: master.domain.net
Master_User: repl
Master_Port: 3306
Master_Log_File: binary-log.000033
Exec_Master_Log_Pos: 189628335...
and these are optional if using SSL .....
Master_SSL_Allowed: Yes
Master_SSL_CA_File:
/etc/mysql/ssl/ca_self.crt
Master_SSL_Cert:
/etc/mysql/ssl/client-cert.pem
Master_SSL_Key:
/etc/mysql/ssl/client-key.pem3)
Next, issue a RESET SLAVE:> RESET
SLAVE;4) Now we issue a CHANGE
MASTER command, for example (substituting your own values of
course):> change master
tomaster_host='master.domain.net',master_user='repl',master_password='thereplpassword',master_port=3306,master_log_file='binary-log.000033',master_log_pos=189628335,master_ssl=1,master_ssl_ca='/etc/mysql/ssl/ca_self.crt',master_ssl_cert='/etc/mysql/ssl/client-cert.pem',master_ssl_key='/etc/mysql/ssl/client-key.pem';The
last 4 master_ssl parameters are not required if not replicating over
SSL.5) Finally, start the
slave:> START
SLAVE;And check again with SHOW
SLAVE STATUS to make sure we are replicating
again.Note: If you are
getting repeated situations where replication is getting errors and stopping,
then you need to reassess your setup. It is VERY important to have BOTH master
and slave on Uninterruptable Power Supplies if that is not obvious! If you have
recovered and still get errors, then a full recovery by getting a full dump from
master and a scratch slave setup is
necessary.Automated
MonitoringYou can of course write some
scripts to perform the slave running check every 5 minutes and email you if it
has had an error and stopped replicating. Jeremy
Zawodny in his book discusses ways to automate slave replication
checking and alert you when replication has stopped or fallen too far
behind.
Posted: Tuesday - July 17, 2007 at 10:51 AM