MySQL High Availability

The underlying implementation of MySQL High Availability is based on Binlog for data synchronization, and customers cannot turn off Binlog, otherwise it will cause high availability abnormalities.

Self-recovery of High Availability

For high-availability instances, there is an HAHealthState monitoring metric. Under normal circumstances, the value of this metric is 0. If an issue is detected that may prevent the cluster from ensuring normal disaster recovery, the value of this metric will change to a non-zero value, indicating that the high availability is in an abnormal state.

When a high availability anomaly occurs and there is no automatic recovery, you can try the following steps to repair the high availability of the instance

Check whether the current business is normal, if the business is abnormal, please contact technical support
Under normal business conditions, check the disk usage of the high-availability standby library, and the corresponding monitoring item indicator is HAStandyDBDiskUsage. If the disk usage of the high-availability standby database reaches 100%, attempt to clean up some Binlog on the Binlog management page, urgently release the space, after the space is released, the high-availability state will generally be automatically repaired, and after normal, according to the business situation, upgrade the disk configuration.
After releasing the space, wait for 2 minutes. If the high availability is still abnormal, and HAHealthState is 3, 4, try to redo the standby library. Redoing the standby library will restore the data by exporting data from the master library. Before the redo is initiated, a pre-check of the redo conditions will be performed. For instances without exceptions, you can directly click OK to start the redo. For instances where the size of the non-transaction engine is more than 1G (affecting the table locking time synchronized with the master library), instances where there are long transactions (long transactions will cause the redo to not get the lock and always redo), and instances where the replication relationship is incorrect (unreasonable replication The relationship will be erased), you need to manually select the force to start the redo.

Monitoring and Alerting Version Upgrading