Upstream MySQL synchronization to downstream TiDB is exceptionally slow

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 上游mysql 同步到下游的tidb,异常慢

| username: 初来乍到

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.0.0
[Reproduction Path]
Machine configuration: MySQL 8c32g, DM 4c8g, TiDB 32h64g

  1. Upstream database MySQL 5.75, using DM to synchronize MySQL data to downstream TiDB. DM and MySQL are in the same intranet, while DM and TiDB are not.
  2. Both DM and TiDB are using default configurations.
  3. During the DM synchronization process, the following error messages were reported on the MySQL side:

4. DM worker information is as follows:

After comparison, it was found that the current DM program is synchronizing, but the connection between DM and MySQL is frequently interrupted. At the same time, DM is synchronizing data to the TiDB environment, but the data insertion rate is too slow, and it may take a day to catch up with a single binlog from MySQL. I hope experts can provide guidance.
DM program is as follows:

[Encountered Issues: Problem Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

  1. Provide all the configuration information first.
  2. DM and TiDB are not on the same intranet (Why are they not on the same subnet? Could there be bandwidth issues with the network?)
  3. You can use Grafana parameters to determine whether the issue is between MySQL → DM or DM → TiDB.
  4. It is recommended to provide more information.
| username: 初来乍到 | Original post link

DM machine configuration information

DM cluster information

The DM machine is configured with two dmworkers, and there are two sets of TiDB in total. Syncing from the same MySQL to TiDB cluster 1 on port 8261 is normal and real-time, but syncing to TiDB cluster 2 on port 28261 is experiencing the aforementioned slow synchronization issue. Considering the large MySQL binlog, but only a small amount of data from the upstream MySQL needs to be synchronized to TiDB2, the DM is configured on the same intranet as the upstream MySQL. It has been confirmed that the bandwidth is not an issue and the network is stable (synchronization to TiDB cluster 1 is normal) with 200M bandwidth, currently using about 10M.
In the case where Grafana is inaccessible, are there any diagnostic methods? Currently, it’s a holiday and the operations team hasn’t opened the public access port for Grafana. What other information needs to be supplemented? Any guidance would be appreciated. Thank you.

| username: Billmay表妹 | Original post link

Try putting DM, MySQL, and TiDB on the same intranet.

| username: 初来乍到 | Original post link

Due to business needs, they are not on the same intranet. MySQL and TiDB are on two different public clouds.

| username: xiaoqiao | Original post link

The number of fields and whether there are large fields have a significant impact.

| username: 初来乍到 | Original post link

After looking into the “exec too slow” situation, it seems to be related to some inserts of about 2000 rows. Personally, I feel that this might be caused by the MySQL side interrupting the connection. I searched online for some information but couldn’t understand what causes the connection interruption in the upstream MySQL logs. When syncing to TiDB cluster 1, there were no connection interruptions. Additionally, I noticed that in the dm_meta database, this task only generated two tables:

While other normal logs have three tables:

I don’t know what causes this issue.

| username: rebelsre | Original post link

  1. The synchronization content between tidb1 and tidb2, as well as the differences between the two clusters.
  2. It would be best to provide the DM data source configuration and task synchronization configuration, as well as the current DM synchronization status information using query-status.
  3. Whether the MySQL binlog production speed and DM binlog fetching speed constitute a bottleneck.
  4. The performance of tidb2, and whether a simple test can be conducted to check if the write operations are normal.
  5. Since DM and TiDB are not in the same network environment, it is also recommended to check if the network proxy forwarding is functioning properly.
| username: 初来乍到 | Original post link

  1. The relationship between tidb1 and tidb2: tidb1 is on Tencent Cloud, and tidb2 is on Huawei Cloud. The data to be synchronized is the same, and the goal is to deprecate tidb1 and use tidb2 to provide services for the business once tidb2 is fully synchronized.

  2. DM data source configuration is as follows:
    DM configuration information

    Query-status information

  3. MySQL binlog generation speed is about 4-5 binlogs per hour, each approximately 1.1G.

  4. Using select insert to test, writing 3000 rows of data only takes 0.16 seconds.

| username: TIDB-Learner | Original post link

Cross-cloud synchronization requires a large bandwidth. If the synchronization frequency is high, it feels like it will significantly impact network speed.

| username: tidb狂热爱好者 | Original post link

You have two issues. One is that they are not on the same intranet, and the bandwidth is throttled, resulting in poor transmission performance over the internet. The second issue is with the DM write speed; a single node perceives the write speed to be just 10MB. DM is suitable for use within 100GB, but if it’s larger, DM is not appropriate.

| username: 初来乍到 | Original post link

Are you referring to the bandwidth write rate of the DM program or the write rate of the DM program itself?

| username: DBAER | Original post link

Check the network speed, is the network I/O reaching its limit, especially the outbound network?

| username: TiDBer_JUi6UvZm | Original post link

Let’s break it down into two issues. First, regarding the errors between MySQL and DM, you can refer to this for troubleshooting: 【MySQL技术专题】「问题实战系列」MySQL报错Got an error reading communication packets问题分析指南-CSDN博客

For the issues between DM and TiDB, you can start with a few simple tests:

  1. First, ping to check for any packet loss.
  2. Use scp to transfer a large file and test the network speed.
  3. Confirm the network bandwidth of both environments. Don’t assume it’s 100M; it might actually be smaller.
| username: dba远航 | Original post link

First, ping the machine to check if there is any abnormal latency.

| username: Lystorm | Original post link

The network transmission time is too long, isn’t it?

| username: Soysauce520 | Original post link

You can check the write pressure of TiDB2. Writing 2000 rows without large fields still takes 20 seconds, which is probably due to the cluster’s performance issues.

| username: TiDBer_21wZg5fm | Original post link

Test the network speed and stability for each segment.

| username: virvle | Original post link

Check the network speed between the clouds. We previously connected the internal networks of Alibaba Cloud and Tencent Cloud, and after the connection, the network speed was not significantly affected.

| username: TiDBer_0p0BD6le | Original post link

If you’re unsure whether the two issues are related, you can first investigate why DM is reporting an error when connecting to MySQL. It seems to be a potential network issue. The error with TiDB is a separate issue; it occurs when a transaction takes more than 1 second to execute. You can troubleshoot the first issue separately before looking into the second one.