DM did not automatically enable safe_mode mode when the connection was interrupted and the task was restarted, causing the task to fail

translator_bot · June 21, 2024, 5:59pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM在连接中断，重启任务时候没有自动开启safe_mode模式，导致任务失败

| username: dba-kit

Phenomenon: After DM encounters the driver: bad connection error, restarting the task results in a Duplicate entry error, causing the task to fail. (PS: The bad connection error occurred because TiDB was set with wait_timeout=600, and MySQL had a very low update frequency at the time, with no data being written for a long time. This caused the task’s connection to be actively killed by TiDB after being idle for 600 seconds, which is expected.)
The error log is:

According to the documentation, when automatically retrying the task, it should automatically set the safe_mode to 60s by default, but it appears that it did not activate.

translator_bot · June 21, 2024, 5:59pm

| username: Fly-bird | Original post link

Take a look at the configuration.

translator_bot · June 21, 2024, 5:59pm

| username: 路在何chu | Original post link

There shouldn’t be any issue with the TSO specification, right?

translator_bot · June 21, 2024, 5:59pm

| username: 像风一样的男子 | Original post link

This can be considered a flaw of DM. It doesn’t record the last interruption point to achieve breakpoint resumption.

translator_bot · June 21, 2024, 5:59pm

| username: dba-kit | Original post link

Actually, there are records, but they are periodically recorded into the downstream TiDB and not written in real-time, so sometimes errors occur.

translator_bot · June 21, 2024, 5:59pm

| username: okenJiang | Original post link

How did you discover that safe mode was not enabled? Do you have more logs to help with troubleshooting?

translator_bot · June 21, 2024, 5:59pm

| username: dba-kit | Original post link

The log reported a duplicate key. After checking, the record was indeed just inserted, so it can only be that DM inserted it twice.

translator_bot · June 21, 2024, 5:59pm

| username: okenJiang | Original post link

Is this parameter set? safe-mode-duration

translator_bot · June 21, 2024, 5:59pm

| username: 路在何chu | Original post link

Delete the previous data and re-import it. It seems like there is data inconsistency. Duplicate entries were inserted.

translator_bot · June 21, 2024, 5:59pm

| username: lichunzhu-PingCAP | Original post link

It will not write the end point of safe-mode in real-time, but if the exit point of safe-mode cannot be obtained, the safe-mode will be enabled for a period of time by default.

translator_bot · June 21, 2024, 5:59pm

| username: andone | Original post link

safe-mode-duration

translator_bot · June 21, 2024, 5:59pm

| username: TiDBer_小阿飞 | Original post link

Safe mode relies on determining conflicts through primary keys or unique indexes. If the corresponding table in the downstream database does not have a primary key or unique index, the REPLACE statement will not achieve the purpose of replacing the insert. In this case, even if safe mode is enabled, DM will rewrite the INSERT statement to REPLACE and execute it, still inserting duplicate records downstream.

This safe mode still has dependencies.