DM not syncing, but no errors reported

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dm不同步,但是也不报错。

| username: tuyi锅子

[Test Environment] Testing environment
[TiDB Version]
tidb: 6.1
dm: 6.1
[Encountered Issue]


Found that dm is not syncing in the testing environment, checked the dm-worker logs

Monitoring shows that a tikv node restarted (oom) at the same time

[Actions Taken]
After stop-task and start-task again, the error shown in the picture occurred

| username: 猴子的救兵 | Original post link

The last image should have detailed error information.

| username: tuyi锅子 | Original post link

Is there any other place to check the error information besides the error message shown in the picture?

| username: tuyi锅子 | Original post link

The same error is present in the dm-master logs.

| username: db_user | Original post link

Check if the upstream and downstream of the DM configuration can be connected normally.

| username: tuyi锅子 | Original post link

The upstream and downstream connections are normal.

| username: db_user | Original post link

  1. Did this error occur after normal synchronization for a period of time?
  2. Please provide the logs for DM’s master and worker.
  3. Is there any transaction or binary file larger than 4GB in the upstream? Has DM enabled relay-log?
  4. Check if the pos point of the syncer binlog at the error location is the last pos of the current binary file.
| username: xiaohetao | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: xiaohetao | Original post link

The image is not visible. Please provide the text you need translated.

| username: tuyi锅子 | Original post link

  1. Yes, normal synchronization takes several weeks. Actually, there were no errors at the beginning, but we found that it was not synchronizing. After stopping the task and starting it again, errors appeared.

  2. Worker logs


    After the error, there are a lot of “(flushed {{{mysql-bin.008471 231863793} 0} })” logs.

    There are no logs in the master log on the 19th.

  3. The upstream does not have transactions exceeding 4G, and each binlog is 250M. DM is using the default configuration. If relay-log requires special configuration to be enabled, then it should not be enabled.

  4. It is not the last pos of the binary file.

| username: tuyi锅子 | Original post link

Could you please tell me how to check this issue?

| username: db_user | Original post link

Oh, then you can check what the expert mentioned above, check if the ports are accessible, telnet dm-master port, telnet dm-worker port, see if they can access each other.

Then you can try restarting dm-worker, see if it has any effect, not the task but the worker.

If it still doesn’t work, you can try enabling relay-log, then pull the error binlog to the relay-log directory to see if it succeeds.

| username: xiaohetao | Original post link

Check all communication ports related to DM to see if there are any anomalies.

| username: tuyi锅子 | Original post link

There is no problem with the port of DM.

| username: tuyi锅子 | Original post link

There is no issue with the DM port, I’ll try restarting the worker.

| username: 猴子的救兵 | Original post link

There should be an issue with the service.

| username: tuyi锅子 | Original post link

After restarting DM, there are indeed no errors, but it is not syncing.


The syncerBinlog has not changed, and the threads in the upstream database are also in a sleep state.

| username: tuyi锅子 | Original post link

After restarting DM, there are no error messages, but it just doesn’t synchronize.

| username: db_user | Original post link

Do you still have the binlog from your upstream? Check with show binary logs. Then restart the dm-worker and try stopping and starting the task again. If that doesn’t work, try using the relay-log method.

| username: tuyi锅子 | Original post link

It seems that there is an issue with DM. The stop-task command is stuck for a long time and then reports the following error:


When querying the status again, the above problem appears: