Testing the Disk Write Time of MySQL Master-Slave Logs

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: MySQL主从日志落盘时间测试

| username: xiaoqiao

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]
Does anyone have experience with testing the time difference when MySQL primary and secondary databases write to disk? I want to see how much time this part takes, especially for remote disaster recovery RPO time.

| username: tidb菜鸟一只 | Original post link

It definitely depends on your actual environment, it’s not fixed.

| username: DBAER | Original post link

Parse binlog and look at these two timestamps: the difference between original_commit_timestamp and immediate_commit_timestamp can be found here:

| username: zhaokede | Original post link

:+1:

| username: zhang_2023 | Original post link

There are too many variables, such as host performance, network bandwidth…

| username: TiDBer_嘎嘣脆 | Original post link

Analysis of binlog implementation

| username: xiaoqiao | Original post link

Your solution actually tests the time it takes for the data to be written to the slave database.

We now want to test the time it takes for the Relay log to be written to disk. Generally, as long as the data reaches the relay log, the local MySQL thread will definitely continue to write to its own database, regardless of any anomalies in the master database.

| username: xiaoqiao | Original post link

There is a testing plan, and specific scenarios will be handled accordingly. A standard time is not required.

| username: xiaoqiao | Original post link

To test the relay log time, you can write to the binlog to verify that the data has already been transferred to the remote location.

| username: TiDBer_QYr0vohO | Original post link

Mark it.

| username: DBAER | Original post link

Parsing relay-log is the time when the relay is written to disk.
Parsing slave binlog is the time when the log is written to disk after being applied.

| username: 考试没答案 | Original post link

The time test varies depending on the architecture: for example, delayed master-slave, asynchronous master-slave, and real-time master-slave. They are all different. Test according to your requirements.

| username: 呢莫不爱吃鱼 | Original post link

Mark it, and study it later.

| username: xiaoqiao | Original post link

I just don’t know how to test it. It’s not about standard data, but is there any testing methodology for the relay log’s disk write time (excluding playback)?

| username: 源de爸 | Original post link

Good idea, integrating it into the self-built dashboard. Leaders tend to like these kinds of metrics.

| username: Kamner | Original post link

In fact, for remote disaster recovery, the network bandwidth has a greater impact on the RPO. If you only consider disk writes, then it depends on disk performance. Looking at it in isolation is meaningless.

| username: Kamner | Original post link

This RPO is also related to your replication type:

Asynchronous Replication

Semi-Synchronous Replication

MGR

You also need to consider the impact on the primary database. For performance comparisons of various modes, you can refer to:

https://dev.mysql.com/blog-archive/semi-synchronous-replication-performance-in-mysql-5-7/
https://dev.mysql.com/blog-archive/an-overview-of-the-group-replication-performance/

| username: TiDBer_HErMeXDz | Original post link

Parsing binlog, you can look at the difference between these two timestamps: original_commit_timestamp and immediate_commit_timestamp.

| username: 呢莫不爱吃鱼 | Original post link

Here to learn.

| username: zhaokede | Original post link

:+1: