Dm-work retry mechanism

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dm-work重试机制

| username: liujia

[TiDB Usage Environment] Production Environment
[TiDB Version] DM Version: v2.0.6
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
After a network interruption on the source side, dm-worker loses connection and cannot pull binlog, causing the sync task to exit. Sync resumes after restarting the task.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
03:15:59 err read tcp 10.11.45.31:59352->10.9.42.85:3306: i/o timeout: connection was bad

03:16:57 unit process error. err read tcp 10.11.45.31:59352->10.9.42.85:3306: i/o timeout: connection was bad

According to the official documentation, DM has a retry mechanism, but it does not provide detailed information. From the logs, it appears that it stopped retrying after one minute. Is there any documentation that provides a detailed introduction to the retry mechanism?

| username: Billmay表妹 | Original post link

What version of TiDB are you using?

DM has a retry mechanism to handle network interruptions and other anomalies during data synchronization. In the DM configuration file, you can control the number of retries and the time interval by setting syncer.max-retry-count and syncer.retry-unit. By default, syncer.max-retry-count is 10 and syncer.retry-unit is 1 second, meaning that after a connection failure, it will retry after 1 second, up to a maximum of 10 retries.

If the connection is restored within the retry count, DM will automatically resume the synchronization task. If the retry count is exhausted, the synchronization task will exit and need to be manually restarted.

You can refer to the Synchronization Task Configuration section in the official DM documentation for more details on the retry mechanism.

| username: Hacker007 | Original post link

The version is too old. The new version will retry, but the old version cannot retry and needs to be restarted.
Failure and Handling Methods | PingCAP Documentation Center

| username: Anna | Original post link

Please see this

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.