DM not syncing, but no errors reported

translator_bot · June 23, 2024, 5:44am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: dm不同步，但是也不报错。

| username: tuyi锅子

[Test Environment] Testing environment
[TiDB Version]
tidb: 6.1
dm: 6.1
[Encountered Issue]

Found that dm is not syncing in the testing environment, checked the dm-worker logs

Monitoring shows that a tikv node restarted (oom) at the same time

[Actions Taken]
After stop-task and start-task again, the error shown in the picture occurred

translator_bot · June 23, 2024, 5:44am

| username: 猴子的救兵 | Original post link

The last image should have detailed error information.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

Is there any other place to check the error information besides the error message shown in the picture?

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

The same error is present in the dm-master logs.

translator_bot · June 23, 2024, 5:44am

| username: db_user | Original post link

Check if the upstream and downstream of the DM configuration can be connected normally.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

The upstream and downstream connections are normal.

translator_bot · June 23, 2024, 5:44am

| username: db_user | Original post link

Did this error occur after normal synchronization for a period of time?
Please provide the logs for DM’s master and worker.
Is there any transaction or binary file larger than 4GB in the upstream? Has DM enabled relay-log?
Check if the pos point of the syncer binlog at the error location is the last pos of the current binary file.

translator_bot · June 23, 2024, 5:44am

| username: xiaohetao | Original post link

The image you provided is not visible. Please provide the text you need translated.

translator_bot · June 23, 2024, 5:44am

| username: xiaohetao | Original post link

The image is not visible. Please provide the text you need translated.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

Yes, normal synchronization takes several weeks. Actually, there were no errors at the beginning, but we found that it was not synchronizing. After stopping the task and starting it again, errors appeared.
Worker logs

image1380×728 86 KB

After the error, there are a lot of “(flushed {{{mysql-bin.008471 231863793} 0} })” logs.

image1380×728 141 KB

There are no logs in the master log on the 19th.

image1380×728 458 KB
The upstream does not have transactions exceeding 4G, and each binlog is 250M. DM is using the default configuration. If relay-log requires special configuration to be enabled, then it should not be enabled.
It is not the last pos of the binary file.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

Could you please tell me how to check this issue?

translator_bot · June 23, 2024, 5:44am

| username: db_user | Original post link

Oh, then you can check what the expert mentioned above, check if the ports are accessible, telnet dm-master port, telnet dm-worker port, see if they can access each other.

Then you can try restarting dm-worker, see if it has any effect, not the task but the worker.

If it still doesn’t work, you can try enabling relay-log, then pull the error binlog to the relay-log directory to see if it succeeds.

translator_bot · June 23, 2024, 5:44am

| username: xiaohetao | Original post link

Check all communication ports related to DM to see if there are any anomalies.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

There is no problem with the port of DM.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

There is no issue with the DM port, I’ll try restarting the worker.

translator_bot · June 23, 2024, 5:44am

| username: 猴子的救兵 | Original post link

There should be an issue with the service.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

After restarting DM, there are indeed no errors, but it is not syncing.

The syncerBinlog has not changed, and the threads in the upstream database are also in a sleep state.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

After restarting DM, there are no error messages, but it just doesn’t synchronize.

translator_bot · June 23, 2024, 5:44am

| username: db_user | Original post link

Do you still have the binlog from your upstream? Check with show binary logs. Then restart the dm-worker and try stopping and starting the task again. If that doesn’t work, try using the relay-log method.

translator_bot · June 23, 2024, 5:44am

| username: tuyi锅子 | Original post link

It seems that there is an issue with DM. The stop-task command is stuck for a long time and then reports the following error:

When querying the status again, the above problem appears: