Both dm-master and dm-worker crash simultaneously, causing synchronization sub-tasks to report errors

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dm-master和dm-worker 同时宕机, 同步的子任务 同步报错

| username: fly4310862

【TiDB Usage Environment】Production\Test Environment\POC
【TiDB Version】dm 2.0.7 version
【Encountered Problem】Both dm-master and dm-worker crashed simultaneously. After the machines restarted, 2 out of 10 sub-tasks reported synchronization errors. How to fix the sub-tasks?
【Reproduction Path】What operations were performed to cause the issue
【Problem Phenomenon and Impact】
{
“result”: true,
“msg”: “”,
“sourceStatus”: {
“source”: “source_tally1”,
“worker”: “dm-127.0.0.1-8263”,
“result”: null,
“relayStatus”: null
},
“subTaskStatus”: [
{
“name”: “task_new”,
“stage”: “Paused”,
“unit”: “Sync”,
“result”: {
“isCanceled”: false,
“errors”: [
{
“ErrCode”: 10006,
“ErrClass”: “database”,
“ErrScope”: “not-set”,
“ErrLevel”: “high”,
“Message”: “startLocation: [position: (, 0), gtid-set: ], endLocation: [position: (binlog.xxxx, xxxx), gtid-set: ]: execute statement failed: commit”,
“RawCause”: “Error 1062: Duplicate entry ‘3459239597’ for key ‘uniq_guid’”,
“Workaround”: “”
}
],
“detail”: null
},
“unresolvedDDLLockID”: “”,
“sync”: {
“totalEvents”: “20674”,
“totalTps”: “344”,
“recentTps”: “0”,
“masterBinlog”: “(binlog.xxxx, xxxx)”,
“masterBinlogGtid”: “”,
“syncerBinlog”: “(binlog.xxxx, xxxx)”,
“syncerBinlogGtid”: “”,
“blockingDDLs”: ,
“unresolvedGroups”: ,
“synced”: false,
“binlogType”: “remote”,
“secondsBehindMaster”: “0”
}
}
]
},
{
“result”: true,
“msg”: “”,
“sourceStatus”: {
“source”: “source_7”,
“worker”: “dm-127.0.0.1-8266”,
“result”: {
“isCanceled”: false,
“errors”: [
{
“ErrCode”: 40071,
“ErrClass”: “dm-worker”,
“ErrScope”: “internal”,
“ErrLevel”: “high”,
“Message”: “mysql source worker dm-127.0.0.1-8266 has already started with source source_abc, but get a request with source source_7”,
“RawCause”: “”,
“Workaround”: “Please try restart this DM-worker”
}
],
“detail”: null
},
“relayStatus”: null
},
“subTaskStatus”: [
{
“name”: “task_new”,
“stage”: “InvalidStage”,
“unit”: “InvalidUnit”,
“result”: null,
“unresolvedDDLLockID”: “”,
“msg”: “no sub task with name task_new has started”
}
]
},


For questions related to performance optimization and troubleshooting, please download and run the script. Please select all and copy-paste the terminal output.

| username: buchuitoudegou | Original post link

Could you please describe what operations were performed? For example, deployment method, startup method.

Error 1062: Duplicate entry ‘3459239597’ for key ‘uniq_guid

It shows a duplicate key error here, so the syncer’s status is Paused. Could you clarify if the crash you mentioned occurred automatically after the task was Paused?

| username: fly4310862 | Original post link

The issue has been resolved. It was caused by both dm-master and dm-worker crashing simultaneously. Thank you.

| username: buchuitoudegou | Original post link

The DM cluster can deploy redundant master and worker nodes to achieve high availability. Thank you for your feedback!