During DM synchronization, it is not possible to set SetConnMaxLifetime for the downstream, and occasional false alarms occur due to downstream connection interruptions

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: DM同步时,没办法对下游设置SetConnMaxLifetime,偶尔会因为下游连接中断,出现误报

| username: dba-kit

Phenomenon

The online DM task occasionally triggers the DM_sync_process_exists_with_error alert. Upon investigation, it was found that there are basically two types of errors:
Message: database driver, RawCause: driver: bad connection and execute statement failed: begin\" RawCause:\"invalid connection\"

Cause

The downstream TiDB is set with wait_timeout=600. If the update frequency of the upstream MySQL is very low, for example, there is only one update every half an hour, the syncer connection of DM will be killed by the downstream TiDB due to being in a sleep state for a long time. If there is an update at this time, this error will be triggered.

Suggested Fix

Add parameters in the DM task configuration to allow users to set Golang’s connection parameters independently (you can only expose ConnMaxLifetime). Or directly retry internally without returning an error. The main issue is that the occasional DM_sync_process_exists_with_error error is really annoying.

| username: dba-kit | Original post link

Additionally, the name DM_sync_process_exists_with_error is also a typo, it should be exits_with_error.

| username: Hacker007 | Original post link

Will there be any issues with data consistency? It is acceptable if data synchronization is not affected after a warning is prompted.

| username: lance6716 | Original post link

Thank you for the feedback. We will track the progress of the fix at SQL connection should tolerate being killed by idle too long · Issue #7376 · pingcap/tiflow · GitHub.

Setting a fixed value cannot handle idle periods of uncertain duration, so we will address this issue from another perspective.

| username: dba-kit | Original post link

During the dump phase, if there are large tables during the backup, resulting in connections not being used for a long time, a driver: bad connection error may occur, causing the task to fail.

| username: lance6716 | Original post link

The new version should have fixed the issue during the dump phase. Which version of DM are you using?

| username: dba-kit | Original post link

6.1.1

| username: lance6716 | Original post link

That indeed shouldn’t happen. Please upload the logs.

| username: dba-kit | Original post link

Similar issues occur periodically, causing task failures. Later, after modifying the MySQL wait_timeout parameter, the export was successful.

[2022/10/17 14:56:59.918 +08:00] [INFO] [conn.go:70] ["cannot execute query"] [task=clear_task] [unit=dump] [retryTime=1] [sql="SHOW COLUMNS FROM `ying99_fundtxn`.`campain_discount`"] [args=null] [error="driver: bad connection"]
[2022/10/17 14:57:00.603 +08:00] [ERROR] [dumpling.go:152] ["dump data exits with error"] [task=clear_task] [unit=dump] ["cost time"=16m55.166129053s] [error="ErrCode:32001 ErrClass:\"dump-unit\" ErrScope:\"internal\" ErrLevel:\"high\" Message:\"mydumper/dumpling runs with error, with output (may empty): \" RawCause:\"sql: SHOW COLUMNS FROM `ying99_fundtxn`.`campain_discount`: driver: bad connection\" "]
| username: lance6716 | Original post link

Please provide the complete log.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.