A broken MySQL replication can be caused due to various factors such a network issue and db performance issues (IO and storage), This article will explain how to detect and fix a broken replication caused by an event.
Pre-requisites:
- Admin/Full privileged access to MySQL Server
Step 1: To get started login to the MySQL server
mysql -u <username> -h <host url> -p
or sudo -i mysql
(For Hosted MySQL)
Step 2: Check MySQL replication status
show slave status \G
Search for the term Slave_IO_Running to see its status
If the value of Slave_SQL_Running
is No
then it confirms that the replication is broken. The ideal value must beYES
.
Step 3: Recover MySQL Slave status
The reason for Slave_SQL_Running
is set toNo
can be due to a broken important/unimportant transaction. For this to recover, you will require to stop the slave and set the global sql_slave_skip_counter
so that it can skip the ongoing error-prone transaction to recover its state.
Note: For the below command to work you need to ensure that the Slave thread is not working.
stop slave;
set global sql_slave_skip_counter = 1;
start slave;
This way executing the above command sequence will skip a given number of event/transaction from the master.
You need to repeat the above-given sequence until the error-prone statement/event is skipped after which the replication status will be active one again.
Note: It is advised that the number of events to skip should be set to1
so that you don’t miss out of an important event.
You can monitor the status of the Slave_SQL_Running_State
after each run to see which event it would skip next. Once the problematic event is skipped, the replication state should resume normally. You can check the parameters like Seconds_Behind_Master
and Slave_SQL_Running
to confirm is the replication is working and how much time it will take to sync up with the Master. You can find more details about global sql_slave_skip_counter
here.
To speed up syncing of Slave you can try tunning InnoDB flush log:
set global innodb_flush_log_at_trx_commit=2;
Until slave is synced then set it back to 1
Using this approach you an easily recover Slave status quickly. Let us know your views regarding this article by commenting below.