TiDB DM Synchronization Error

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb dm 同步报错

| username: love-cat

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 5.2.2
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact] Error reported during data synchronization via DM
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]
[2023/06/02 09:50:08.422 +00:00] [ERROR] [subtask.go:311] [“unit process error”] [subtask=thc_1084_hazxy721] [unit=Dump] [“error information”=“{"ErrCode":32001,"ErrClass":"dump-unit","ErrScope":"internal","ErrLevel":"high","Message":"mydumper/dumpling runs with error, with output (may empty): ","RawCause":"invalid connection"}”]

2023/6/2 17:50:10

| username: xfworld | Original post link

It looks like a connection issue. Why does DM receive the error

mydumper/dumpling runs with error, with output (may empty)

?

| username: love-cat | Original post link

I searched online and it also mentioned connection issues. However, whether connecting to the main database or connecting to TiDB (replica) on the DM node, there are still connection problems. The logs do not show any other issues except for the errors mentioned above.

| username: xfworld | Original post link

I can’t determine without knowing your scenario…

How about you provide more details?

| username: Hacker007 | Original post link

Did this exception occur after running for a while? If so, try increasing the max-allowed-packet parameter.

| username: dba-kit | Original post link

You need to increase the wait_timeout parameter in MySQL.

| username: redgame | Original post link

Timed out.

| username: love-cat | Original post link

tail -100 dm-worker_stderr.log

[mysql] 2023/06/08 18:22:24 packets.go:73: unexpected EOF
[mysql] 2023/06/08 18:22:24 packets.go:428: busy buffer

| username: love-cat | Original post link

What is the cause of this? Is there a large table?

| username: love-cat | Original post link

Adjusted to 128M, still not working. set global max_allowed_packet = 134217728

| username: love-cat | Original post link

[2023/06/08 11:21:46.988 +00:00] [INFO] [collector.go:194] [“backup failed summary”] [task=thc_1084_hazxy721] [unit=dump] [total-ranges=1] [ranges-succeed=0] [ranges-failed=1] [unit-name=“dump table data”] [error=“invalid connection”] [errorVerbose=“invalid connection\n
github.com/pingcap/errors.AddStack
/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/errors.go:174
github.com/pingcap/errors.Trace\n\t
/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/juju_adaptor.go:15\n
github.com/pingcap/dumpling/v4/export.(*rowIter).Error\n\t
/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/ir_impl.go:42
github.com/pingcap/dumpling/v4/export.WriteInsert\n\t
/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer_util.go:271\n
github.com/pingcap/dumpling/v4/export.FileFormat.WriteInsert\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer_util.go:623\ngithub.com/pingcap/dumpling/v4/export.(*Writer).tryToWriteTableData\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:204\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:189\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/nfs/cache/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20210914112841-6ebfe8aa4257/br/pkg/utils/retry.go:47\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:160\ngithub.com/pingcap/dumpling/v4/export.(*Writer).handleTask\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:103\ngithub.com/pingcap/dumpling/v4/export.(*Writer).run\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writ
er.go:85\ngithub.com/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/dump.go:281\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/nfs/cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371”]
[2023/06/08 11:21:46.988 +00:00] [ERROR] [dumpling.go:142] [“dump data exits with error”] [task=thc_1084_hazxy721] [unit=dump] [“cost time”=1m40.122657856s] [error="ErrCode:32001 ErrClass:"dump-unit" ErrScope:"internal" ErrLevel:"high" Message:"mydumper/dumpling runs with error, with output (may empty): " RawCause:"invalid connection" “]
[2023/06/08 11:21:46.988 +00:00] [INFO] [subtask.go:292] [“unit process returned”] [subtask=thc_1084_hazxy721] [unit=Dump] [stage=Paused] [status={}]
[2023/06/08 11:21:46.988 +00:00] [ERROR] [subtask.go:311] [“unit process error”] [subtask=thc_1084_hazxy721] [unit=Dump] [“error information”=”{"ErrCode":32001,"ErrClass":"dump-unit","ErrScope":"internal","ErrLevel":"high","Message":"mydumper/dumpling runs with error, with output (may empty): ","RawCause":"invalid connection"}

| username: Hacker007 | Original post link

This parameter is configured for DM synchronization files.

| username: love-cat | Original post link

Okay, thanks, I’ll add it.

| username: love-cat | Original post link

Got it, thanks a lot.

| username: love-cat | Original post link

The max-allow-packet for dm-work has been adjusted significantly, but errors are still occurring in some environments. The logs are difficult to understand. Can any expert help analyze this?

[writer_util.go:181] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=thc_1093_testzmyl1a] [unit=dump] [database=thc_1093_testzmyl1a] [table=cpoe_advice_fees] [“finished rows”=16093] [“finished size”=11310126] [error=“invalid connection”]
[2023/06/09 12:43:53.979 +08:00] [WARN] [writer_util.go:181] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=thc_1093_testzmyl1a] [unit=dump] [database=thc_1093_testzmyl1a] [table=cpoe_advice_fees_history] [“finished rows”=37251] [“finished size”=27536266] [error=“context canceled”]
[2023/06/09 12:43:53.979 +08:00] [WARN] [writer_util.go:181] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=thc_1093_testzmyl1a] [unit=dump] [database=thc_1093_testzmyl1a] [table=cpoe_medical_technology_record] [“finished rows”=39871] [“finished size”=13062228] [error=“context canceled”]
[2023/06/09 12:43:53.979 +08:00] [INFO] [collector.go:194] [“backup failed summary”] [task=thc_1093_testzmyl1a] [unit=dump] [total-ranges=1] [ranges-succeed=0] [ranges-failed=1] [unit-name=“dump table data”] [error=“invalid connection”] [errorVerbose=“invalid connection\ngithub.com/pingcap/errors.AddStack\n\t/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/juju_adaptor.go:15\ngithub.com/pingcap/dumpling/v4/export.(*rowIter).Error\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/ir_impl.go:42\ngithub.com/pingcap/dumpling/v4/export.WriteInsert\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer_util.go:271\ngithub.com/pingcap/dumpling/v4/export.FileFormat.WriteInsert\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer_util.go:623\ngithub.com/pingcap/dumpling/v4/export.(*Writer).tryToWriteTableData\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:204\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:189\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/nfs/cache/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20210914112841-6ebfe8aa4257/br/pkg/utils/retry.go:47\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:160\ngithub.com/pingcap/dumpling/v4/export.(*Writer).handleTask\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:103\ngithub.com/pingcap/dumpling/v4/export.(*Writer).run\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:85\ngithub.com/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/dump.go:281\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/nfs/cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371”]
[2023/06/09 12:43:53.979 +08:00] [ERROR] [dumpling.go:142] [“dump data exits with error”] [task=thc_1093_testzmyl1a] [unit=dump] [“cost time”=1m51.408481533s] [error=“ErrCode:32001 ErrClass:"dump-unit" ErrScope:"internal" ErrLevel:"high" Message:"mydumper/dumpling runs with error, with output (may empty): " RawCause:"invalid connection" “]
[2023/06/09 12:43:53.979 +08:00] [INFO] [subtask.go:292] [“unit process returned”] [subtask=thc_1093_testzmyl1a] [unit=Dump] [stage=Paused] [status={}]
[2023/06/09 12:43:53.979 +08:00] [ERROR] [subtask.go:311] [“unit process error”] [subtask=thc_1093_testzmyl1a] [unit=Dump] [“error information”=”{"ErrCode":32001,"ErrClass":"dump-unit","ErrScope":"internal","ErrLevel":"high","Message":"mydumper/dumpling runs with error, with output (may empty): ","RawCause":"invalid connection"}”]

| username: Hacker007 | Original post link

If the data volume is indeed very large, see if you can reduce the number of databases synchronized by a single task.

| username: kkpeter | Original post link

The error in the logs occurred during the dump phase. You need to check the source database and adjust the concurrency and the number of rows.

| username: love-cat | Original post link

The source database is MySQL 5.7 ——dm——》 TiDB

There is still this error in dm-worker

| username: love-cat | Original post link

I changed the dumper parameters but it still doesn’t work. I’m not sure how to modify the extra-args: “–consistency none” parameter. Should I use -r directly for this line?

Do I need to modify the MySQL parameters?

| username: 有猫万事足 | Original post link

I think you can control the number of databases/tables synchronized for each task, which will help identify which database/table encountered an error during the dump.

@Hacker007 The above suggestion is a good one.

Directly configure the whitelist in the task configuration, similar to the following:

block-allow-list:
balist-01:
do-dbs:
- “db1”
do-tables:
- db-name: “db1”
tbl-name: “tab1”

Moreover, this way, when maintaining DM operations later, if a specific table encounters a synchronization issue, it won’t block the synchronization of the entire database. By splitting the synchronization into multiple tasks, if a specific table encounters an issue, it will only block a portion of the tables.

Additionally, note that if multiple tasks are synchronizing from a single data source, to avoid impacting the performance of the source database, you might need to enable relay-log on the data source. You can refer to the documentation for details.