DM synchronization suddenly interrupted

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM同步突然中断

| username: Sword

DM Version: v2.0.7
TiDB: v4.0.11
MySQL: 5.6.32
DM-Worker error:

Attempted recovery:

  1. pause-task to pause the task
  2. resume-task to resume the task
    Still encountering errors

The synchronization has been running for a long time and suddenly encountered an error.

| username: weixiaobing | Original post link

You can check the actual work logs.

| username: Sword | Original post link

The worker logs keep reporting the following errors:

[2022/12/20 15:01:15.429 +08:00] [WARN] [task_checker.go:393] [“backoff skip auto resume task”] [component=“task checker”] [task=vgift_account] [latestResumeTime=2022/12/20 14:58:45.429 +08:00] [duration=5m0s]
[2022/12/20 15:01:20.429 +08:00] [WARN] [task_checker.go:393] [“backoff skip auto resume task”] [component=“task checker”] [task=vgift_account] [latestResumeTime=2022/12/20 14:58:45.429 +08:00] [duration=5m0s]
[2022/12/20 15:01:20.480 +08:00] [INFO] [syncer.go:2912] [“binlog replication status”] [task=exchange_order] [unit=“binlog replication”] [total_events=488748531] [total_tps=30] [tps=0] [master_position=“(mysql-bin.000229, 792724909)”] [master_gtid=3d063b5d-ac92-11ea-8d7f-ecf4bbdee7d4:1-186586092,48230336-ac92-11ea-8d80-e4434b036590:1-933946527,538fe0da-ac92-11ea-8d80-246e966090ac:1,9efc0181-e451-11eb-be59-b4055d643320:1-1023] [checkpoint=“position: (mysql-bin.000016, 449314866), gtid-set: 3d063b5d-ac92-11ea-8d7f-ecf4bbdee7d4:1-186586092,48230336-ac92-11ea-8d80-e4434b036590:1-714173783,538fe0da-ac92-11ea-8d80-246e966090ac:1(flushed position: (mysql-bin.000016, 449314866), gtid-set: 3d063b5d-ac92-11ea-8d7f-ecf4bbdee7d4:1-186586092,48230336-ac92-11ea-8d80-e4434b036590:1-714173783,538fe0da-ac92-11ea-8d80-246e966090ac:1)”]

| username: db_user | Original post link

Check if the upstream and downstream are normal, and post the TiDB logs.

| username: Sword | Original post link

TiDB is functioning normally. Not only is this work connected to TiDB, but many other work synchronizations are also normal.

| username: Billmay表妹 | Original post link

You can refer to these two issues~

I remember TiDB is compatible with MySQL 5.7, you can try upgrading the MySQL version~

| username: Sword | Original post link

It should have nothing to do with compatibility. It has been running for a long time, and the problematic statement is a common DML statement “delete”.

| username: Billmay表妹 | Original post link

Did you see this question?

| username: Sword | Original post link

Similar to this issue, I adjusted the GC time and then restarted the synchronization, but the error still occurred.

| username: yilong | Original post link

  1. Does it always get stuck on this SQL?
  2. Check if there are any warnings or errors in the TiDB.log?
| username: lance6716 | Original post link

If the downstream table does not have a primary key or unique index, finding the row during deletion will be very slow, leading to operation timeout. You can try reducing the batch-size in the task configuration.

| username: Sword | Original post link

Does batch-size refer to this parameter? batch? I see that this parameter does not support dynamic updates after the task is created. My task has a large amount of data, and synchronization takes a long time. There is indeed no primary key or unique index. If I add a unique index now, can it solve the problem?

| username: Sword | Original post link

  1. Yes, every time it’s the delete operation that gets stuck.
  2. tidb.log logs
    [2022/12/19 20:34:50.276 +08:00] [WARN] [backoff.go:329] [“tikvRPC backoffer.maxSleep 40000ms is exceeded, errors:\nsend tikv request error: wait recvLoop: context deadline exceeded, ctx: region ID: 211364173, meta: id:211364173 start_key:"t\200\000\000\000\000\000\000\025_i\200\000\000\000\000\000\000\002\003\200\000\000\000\000\000\000\021" end_key:"t\200\000\000\000\000\000\000\025_i\200\000\000\000\000\000\000\002\003\200\000\000\000\000\000\000\035" region_epoch:<conf_ver:234011 version:120 > peers:<id:317502942 store_id:55300573 > peers:<id:317505991 store_id:25638868 > peers:<id:317516391 store_id:314339911 > , peer: id:317505991 store_id:25638868 , addr: 10.92.193.92:10822, idx: 1, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later at 2022-12-19T20:34:07.481488118+08:00\nsend tikv request error: wait recvLoop: context deadline exceeded, ctx: region ID: 211364173, meta: id:211364173 start_key:"t\200\000\000\000\000\000\000\025_i\200\000\000\000\000\000\000\002\003\200\000\000\000\000\000\000\021" end_key:"t\200\000\000\000\000\000\000\025_i\200\000\000\000\000\000\000\002\003\200\000\000\000\000\000\000\035" region_epoch:<conf_ver:234011 version:120 > peers:<id:317502942 store_id:55300573 > peers:<id:317505991 store_id:25638868 > peers:<id:317516391 store_id:314339911 > , peer: id:317505991 store_id:25638868 , addr: 10.92.193.92:10822, idx: 1, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later at 2022-12-19T20:34:29.244181584+08:00\nsend tikv request error: wait recvLoop: context deadline exceeded, ctx: region ID: 211364173, meta: id:211364173 start_key:"t\200\000\000\000\000\000\000\025_i\200\000\000\000\000\000\000\002\003\200\000\000\000\000\000\000\021" end_key:"t\200\000\000\000\000\000\000\025_i\200\000\000\000\000\000\000\002\003\200\000\000\000\000\000\000\035" region_epoch:<conf_ver:234011 version:120 > peers:<id:317502942 store_id:55300573 > peers:<id:317505991 store_id:25638868 > peers:<id:317516391 store_id:314339911 > , peer: id:317505991 store_id:25638868 , addr: 10.92.193.92:10822, idx: 1, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later at 2022-12-19T20:34:50.276827373+08:00”]
    [2022/12/23 18:48:18.376 +08:00] [INFO] [region_cache.go:620] [“switch region peer to next due to send request fail”] [current=“region ID: 211364173, meta: id:211364173 start_key:"t\200\000\000\000\000\000\000\025_i\200\000\000\000\000\000\000\002\003\200\000\000\000\000\000\000\021" end_key:"t\200\000\000\000\000\000\000\025_i\200\000\000\000\000\000\000\002\003\200\000\000\000\000\000\000\035" region_epoch:<conf_ver:234320 version:120 > peers:<id:318852880 store_id:55300575 > peers:<id:318852951 store_id:25638870 > peers:<id:319003158 store_id:314339911 > , peer: id:318852951 store_id:25638870 , addr: 10.92.182.75:10826, idx: 1, reqStoreType: TiKvOnly, runStoreType: tikv”] [needReload=false] [error=“wait recvLoop: context deadline exceeded”]
    [2022/12/24 05:24:34.790 +08:00] [WARN] [syncer.go:150] [“[ddl] etcd-cli put kv failed”] [key=/topology/tidb/10.92.167.167:40006/ttl] [value=1671830672790451428] [error=“context deadline exceeded”] [retryCnt=0]
| username: okenJiang | Original post link

Yes, adding a unique index to this batch should improve the issue.

| username: Hacker007 | Original post link

Is there another full synchronization task in progress when this error occurs? If so, wait for the full synchronization to complete and then observe whether the task returns to normal.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.