How to fix broken MySQL replication

A broken MySQL replication can be caused due to various factors such a network issue and db performance issues (IO and storage), This article will explain how to detect and fix a broken replication caused by an event.

Pre-requisites:

Admin/Full privileged access to MySQL Server

Step 1: To get started login to the MySQL server

mysql -u <username> -h <host url> -p or sudo -i mysql(For Hosted MySQL)

Step 2: Check MySQL replication status

show slave status \G

Search for the term Slave_IO_Running to see its status

If the value of Slave_SQL_Running is No then it confirms that the replication is broken. The ideal value must beYES.

Step 3: Recover MySQL Slave status

The reason for Slave_SQL_Running is set toNocan be due to a broken important/unimportant transaction. For this to recover, you will require to stop the slave and set the global sql_slave_skip_counter so that it can skip the ongoing error-prone transaction to recover its state.

Note: For the below command to work you need to ensure that the Slave thread is not working.

stop slave;
set global sql_slave_skip_counter = 1;
start slave;

This way executing the above command sequence will skip a given number of event/transaction from the master.

You need to repeat the above-given sequence until the error-prone statement/event is skipped after which the replication status will be active one again.

Note: It is advised that the number of events to skip should be set to1so that you don’t miss out of an important event.

You can monitor the status of the Slave_SQL_Running_Stateafter each run to see which event it would skip next. Once the problematic event is skipped, the replication state should resume normally. You can check the parameters like Seconds_Behind_Masterand Slave_SQL_Running to confirm is the replication is working and how much time it will take to sync up with the Master. You can find more details about global sql_slave_skip_counterhere.

To speed up syncing of Slave you can try tunning InnoDB flush log:

set global innodb_flush_log_at_trx_commit=2; Until slave is synced then set it back to 1

Using this approach you an easily recover Slave status quickly. Let us know your views regarding this article by commenting below.

Categories

Most Popular

Most Viewed

Manage IAM User Access for a Multi-Account Architecture

Allowing Non-Root RDS MySQL Users to View and Kill Queries

Securely Deploy from Gitlab to AWS using IAM Assume Roles

Troubleshoot MySQL RDS performance issues

How to fix broken MySQL replication

How to Setup Python3 Virtual Environment