DM Synchronization Error: Context Deadline Exceeded

translator_bot · June 23, 2024, 3:30am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dm同步报错context deadline exceeded

| username: fly4310862

【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】dm2.0.7 version
【Encountered Problem】
【Reproduction Path】What operations were performed to encounter the problem
【Problem Phenomenon and Impact】dm synchronization task, one of the subtasks reported the following error:
【Attachment】

 "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "sourceStatus": {
                "source": "source_0",
                "worker": "dm-127.0.0.1-123",
                "result": {
                    "isCanceled": false,
                    "errors": [
                        {
                            "ErrCode": 50000,
                            "ErrClass": "not-set",
                            "ErrScope": "not-set",
                            "ErrLevel": "high",
                            "Message": "context deadline exceeded",
                            "RawCause": "",
                            "Workaround": ""
                        }
                    ],
                    "detail": null
                },
                "relayStatus": null
            },
            "subTaskStatus": [
                {
                    "name": "task_new",
                    "stage": "Running",
                    "unit": "Sync",
                    "result": null,
                    "unresolvedDDLLockID": "",
                    "sync": {
                        "totalEvents": "18477",
                        "totalTps": "1",
                        "recentTps": "0",
                        "masterBinlog": "(binlog.000002, 123123123)",
                        "masterBinlogGtid": "",
                        "syncerBinlog": "(binlog.000002, 123123123)",
                        "syncerBinlogGtid": "",
                        "blockingDDLs": [
                        ],
                        "unresolvedGroups": [
                        ],
                        "synced": false,
                        "binlogType": "remote",
                        "secondsBehindMaster": "0"
                    }
                }
            ]
        },

If the question is related to performance optimization or troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.

translator_bot · June 23, 2024, 3:30am

| username: Meditator | Original post link

Refer to this.

translator_bot · June 23, 2024, 3:30am

| username: Meditator | Original post link

Please provide the logs of the corresponding worker in the DM cluster for review~!
What operations have been performed before and after?
How is the network connectivity between the DM cluster, upstream MySQL, and TiDB cluster?

translator_bot · June 23, 2024, 3:30am

| username: fly4310862 | Original post link

The upstream binlog exists, and the binlog of this synchronization subtask is constantly changing. It seems to be continuously synchronizing.

translator_bot · June 23, 2024, 3:30am

| username: fly4310862 | Original post link

There are no obvious errors in the logs.
The DM architecture consists of 3 masters and n workers, deployed on three hosts: a, b, and c. After the memory on hosts a and b was fully utilized and then recovered, one worker on host c reported this error.
There are no network issues.

translator_bot · June 23, 2024, 3:30am

| username: Meditator | Original post link

It seems that when the memory of the DM master on either machine A or B is fully utilized, both masters will have issues simultaneously, causing problems for the DM master cluster. The worker on machine C experiences communication timeouts with the master. I think restarting might solve the issue.

translator_bot · June 23, 2024, 3:30am

| username: fly4310862 | Original post link

Restart the DM master? Or the worker? Or both?

translator_bot · June 23, 2024, 3:30am

| username: Meditator | Original post link

Restart the worker

translator_bot · June 23, 2024, 3:30am

| username: Hacker007 | Original post link

First, execute resume-task and restart the task to see.