Testing the Disk Write Time of MySQL Master-Slave Logs

translator_bot · June 20, 2024, 9:40pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: MySQL主从日志落盘时间测试

| username: xiaoqiao

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots / Logs / Monitoring]
Does anyone have experience with testing the time difference when MySQL primary and secondary databases write to disk? I want to see how much time this part takes, especially for remote disaster recovery RPO time.

translator_bot · June 20, 2024, 9:40pm

| username: tidb菜鸟一只 | Original post link

It definitely depends on your actual environment, it’s not fixed.

translator_bot · June 20, 2024, 9:40pm

| username: DBAER | Original post link

Parse binlog and look at these two timestamps: the difference between original_commit_timestamp and immediate_commit_timestamp can be found here:

translator_bot · June 20, 2024, 9:40pm

| username: zhaokede | Original post link

translator_bot · June 20, 2024, 9:40pm

| username: zhang_2023 | Original post link

There are too many variables, such as host performance, network bandwidth…

translator_bot · June 20, 2024, 9:41pm

| username: TiDBer_嘎嘣脆 | Original post link

Analysis of binlog implementation

translator_bot · June 20, 2024, 9:41pm

| username: xiaoqiao | Original post link

Your solution actually tests the time it takes for the data to be written to the slave database.

We now want to test the time it takes for the Relay log to be written to disk. Generally, as long as the data reaches the relay log, the local MySQL thread will definitely continue to write to its own database, regardless of any anomalies in the master database.

translator_bot · June 20, 2024, 9:41pm

| username: xiaoqiao | Original post link

There is a testing plan, and specific scenarios will be handled accordingly. A standard time is not required.

translator_bot · June 20, 2024, 9:41pm

| username: xiaoqiao | Original post link

To test the relay log time, you can write to the binlog to verify that the data has already been transferred to the remote location.

translator_bot · June 20, 2024, 9:41pm

| username: TiDBer_QYr0vohO | Original post link

Mark it.

translator_bot · June 20, 2024, 9:41pm

| username: DBAER | Original post link

Parsing relay-log is the time when the relay is written to disk.
Parsing slave binlog is the time when the log is written to disk after being applied.

translator_bot · June 20, 2024, 9:41pm

| username: 考试没答案 | Original post link

The time test varies depending on the architecture: for example, delayed master-slave, asynchronous master-slave, and real-time master-slave. They are all different. Test according to your requirements.

translator_bot · June 20, 2024, 9:41pm

| username: 呢莫不爱吃鱼 | Original post link

Mark it, and study it later.

translator_bot · June 20, 2024, 9:41pm

| username: xiaoqiao | Original post link

I just don’t know how to test it. It’s not about standard data, but is there any testing methodology for the relay log’s disk write time (excluding playback)?

translator_bot · June 20, 2024, 9:41pm

| username: 源de爸 | Original post link

Good idea, integrating it into the self-built dashboard. Leaders tend to like these kinds of metrics.

translator_bot · June 20, 2024, 9:41pm

| username: Kamner | Original post link

In fact, for remote disaster recovery, the network bandwidth has a greater impact on the RPO. If you only consider disk writes, then it depends on disk performance. Looking at it in isolation is meaningless.