When migrating to TiDB, DM can only synchronize up to 100,000 tables due to the excessive number of data tables

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 数据表量过多,dm迁移至TiDB时仅能同步10万张表

| username: TiDBer_ShFvkFPA

[TiDB Usage Environment] Poc
[Encountered Issue: Using the DM tool to synchronize data from MySQL to TiDB, the number of tables in the upstream database is too large (about 300,000 tables), causing TiDB’s speed to decrease progressively, eventually leading to a write speed of zero (synchronization becomes almost stagnant when reaching around 100,000 tables).

| username: D3Hunter-pingcap | Original post link

Please provide the configuration and logs.

Is the 300,000 table sharded?

| username: TiDBer_ShFvkFPA | Original post link

Approximately 300,000 tables under 3,000 databases;

| username: TiDBer_ShFvkFPA | Original post link

task configuration:
name: wps
task-mode: all

target-database:
host: “10.0.0.0”
port: 000000
user: “Ksc_TiDB”
password: “sdfsafas” # If the password is not empty, it is recommended to use the encrypted ciphertext through dmctl

mydumpers: # Configuration parameters for the dump processing unit
global: # Configuration name
threads: 4 # Number of threads for the dump processing unit to export data from the upstream database instance, default is 4
chunk-filesize: 64 # Size of the data files generated by the dump processing unit, default is 64 MB
extra-args: “–consistency none” # Other parameters for the dump processing unit, no need to configure table-list in extra-args, DM will generate it automatically

loaders: # Configuration parameters for the load processing unit
global:
pool-size: 16
import-mode: “sql”
on-duplicate: “replace”
dir: /data/wps/data

syncers: # Configuration parameters for the sync processing unit
global: # Configuration name
worker-count: 16 # Number of concurrent threads applying the binlog transferred to the local, default is 16. Adjusting this parameter will not affect the concurrency of upstream log fetching but will put significant pressure on the downstream.
batch: 100 # Number of SQL statements in one transaction batch migrated to the downstream database by sync, default is 100, generally recommended not to exceed 500.
enable-ansi-quotes: true # If sql-mode: "ANSI_QUOTES" is set in session, this option needs to be enabled

# If set to true, `INSERT` from upstream will be rewritten as `REPLACE`, and `UPDATE` will be rewritten as `DELETE` and `REPLACE`, ensuring that DML can be repeatedly imported during data migration under the condition that there is a primary key or unique index in the table structure.
safe-mode: true
# If set to true, DM will compress multiple operations on the same data from upstream into one operation as much as possible without increasing latency.
# For example, INSERT INTO tb(a,b) VALUES(1,1); UPDATE tb SET b=11 WHERE a=1; will be compressed into INSERT INTO tb(a,b) VALUES(1,11); where a is the primary key
# For example, UPDATE tb SET b=1 WHERE a=1; UPDATE tb(a,b) SET b=2 WHERE a=1; will be compressed into UPDATE tb(a,b) SET b=2 WHERE a=1; where a is the primary key
# For example, DELETE FROM tb WHERE a=1; INSERT INTO tb(a,b) VALUES(1,1); will be compressed into REPLACE INTO tb(a,b) VALUES(1,1); where a is the primary key
compact: false
# If set to true, DM will merge multiple statements of the same type into one statement as much as possible, generating one SQL statement with multiple rows of data.
# For example, INSERT INTO tb(a,b) VALUES(1,1); INSERT INTO tb(a,b) VALUES(2,2); will become INSERT INTO tb(a,b) VALUES(1,1),(2,2);
# For example, UPDATE tb SET b=11 WHERE a=1; UPDATE tb(a,b) set b=22 WHERE a=2; will become INSERT INTO tb(a,b) VALUES(1,11),(2,22) ON DUPLICATE KEY UPDATE a=VALUES(a), b=VALUES(b); where a is the primary key
# For example, DELETE FROM tb WHERE a=1; DELETE FROM tb WHERE a=2 will become DELETE FROM tb WHERE (a) IN (1),(2); where a is the primary key
multiple-rows: false

mysql-instances:

  • source-id: “source01”
    block-allow-list: “ba-rule1”
    loader-thread: 64
    loader-config-name: “global”
    syncer-config-name: “global”
    mydumper-config-name: “global”

block-allow-list:
ba-rule1:
do-dbs: [“cwbaseshow*”]

| username: lance6716 | Original post link

Currently, using dmctl query-status to query the task status, what is it?

| username: TiDBer_ShFvkFPA | Original post link

I have encountered this problem before. It is because the TiDB version is too low. After upgrading to version 4.0.10, the problem was resolved.

| username: lance6716 | Original post link

From this perspective, it seems that your task is stuck at the full import stage. Could you upload the DM worker logs to see where it is stuck?

| username: wisdom | Original post link

Please share the DM worker logs to see if there is any information in the logs.

| username: TiDBer_ShFvkFPA | Original post link

Around 2023/02/03 22:23:47.002 +08:00, the progress has been stuck at 0%.
[2023/02/03 22:23:43.479 +08:00] [INFO] [collector.go:255] [“backup success summary”] [task=wps] [unit=dump] [total-ranges=567000] [ranges-succeed=567000] [ranges-failed=0] [total-take=2h51m2.991960503s] [total-kv-size=1.426GB] [average-speed=138.9kB/s] [total-rows=9368995]
[2023/02/03 22:23:43.479 +08:00] [INFO] [dumpling.go:202] [“dump data finished”] [task=wps] [unit=dump] [“cost time”=2h52m8.82914405s]
[2023/02/03 22:23:43.479 +08:00] [INFO] [dumpling.go:289] [“progress status of dumpling”] [task=wps] [unit=dump] [total_tables=282000] [finished_tables=282000] [estimated_total_rows=9095077] [finished_rows=9368995] [estimated_progress=100.00%] [“new progress”=“100.00 %”] [bps=88084]
[2023/02/03 22:23:43.479 +08:00] [INFO] [subtask.go:348] [“unit process returned”] [subtask=wps] [unit=Dump] [stage=Finished] [status=“{"totalTables":282000,"completedTables":282000,"finishedBytes":1425501087,"finishedRows":9368995,"estimateTotalRows":9095077,"bps":88084,"progress":"100.00 %"}”]
[2023/02/03 22:23:43.479 +08:00] [INFO] [subtask.go:359] [“switching to next unit”] [subtask=wps] [unit=Dump]
[2023/02/03 22:23:43.479 +08:00] [INFO] [subtask.go:248] [“start to run”] [subtask=wps] [unit=Load]
[2023/02/03 22:23:43.479 +08:00] [INFO] [lightning.go:383] [“lightning load start”] [task=wps] [unit=lightning-load]
[2023/02/03 22:23:43.480 +08:00] [INFO] [util.go:200] [“put load worker in etcd”] [task=wps] [source=source01] [worker=worker01]
[2023/02/03 22:23:43.552 +08:00] [INFO] [checkpoint.go:575] [“initial checkpoint record”] [task=wps] [unit=lightning-load] [component=“lightning checkpoint database list”] [task=wps] [source=source01]
[2023/02/03 22:23:43.648 +08:00] [INFO] [checkpoint.go:595] [“update lightning loader status”] [task=wps] [unit=lightning-load] [component=“lightning checkpoint database list”] [task=wps] [source=source01] [status=running]
[2023/02/03 22:23:43.663 +08:00] [INFO] [info.go:49] [“Welcome to TiDB-Lightning”] [release-version=v6.0.0-master] [git-hash=None] [git-branch=None] [go-version=go1.19.3] [utc-build-time=None] [race-enabled=false]
[2023/02/03 22:23:43.663 +08:00] [INFO] [lightning.go:383] [cfg] [task=wps] [unit=lightning-load] [cfg=“{"id":1675434223663581706,"lightning":{"table-concurrency":64,"index-concurrency":64,"region-concurrency":64,"io-concurrency":5,"check-requirements":true,"meta-schema-name":"","max-error":{"type":0},"task-info-schema-name":"lightning_task_info"},"tidb":{"host":"10.0.5.6","port":31378,"user":"Ksc_TiDB","status-port":10080,"pd-addr":"","sql-mode":"NO_ENGINE_SUBSTITUTION,IGNORE_SPACE,NO_AUTO_VALUE_ON_ZERO,ALLOW_INVALID_DATES","tls":"false","security":{"ca-path":"","cert-path":"","key-path":"","redact-info-log":false},"max-allowed-packet":67108864,"distsql-scan-concurrency":15,"build-stats-concurrency":20,"index-serial-scan-concurrency":20,"checksum-table-concurrency":2,"vars":{"foreign_key_checks":"0","tidb_txn_mode":"optimistic","time_zone":"+08:00"}},"checkpoint":{"schema":"tidb_lightning_checkpoint","driver":"file","enable":true,"keep-after-success":"origin"},"mydumper":{"read-block-size":65536,"batch-size":0,"batch-import-ratio":0,"source-id":"source01","data-source-dir":"file:///data/wps/data.wps","character-set":"auto","csv":{"separator":",","delimiter":"\"","terminator":"","null":"\\N","header":true,"trim-last-separator":false,"not-null":false,"backslash-escape":true},"max-region-size":268435456,"filter":[".","!mysql.","!sys.","!INFORMATION_SCHEMA.","!PERFORMANCE_SCHEMA.","!METRICS_SCHEMA.","!INSPECTION_SCHEMA."],"files":null,"no-schema":false,"case-sensitive":false,"strict-format":false,"default-file-rules":true,"ignore-data-columns":null,"data-character-set":"binary","data-invalid-char-replace":"�"},"tikv-importer":{"addr":"","backend":"tidb","on-duplicate":"replace","max-kv-pairs":4096,"send-kv-pairs":32768,"region-split-size":0,"region-split-keys":0,"sorted-kv-dir":"","disk-quota":9223372036854775807,"range-concurrency":0,"duplicate-resolution":"none","incremental-import":false,"engine-mem-cache-size":536870912,"local-writer-mem-cache-size":134217728,"store-write-bwlimit":0},"post-restore":{"checksum":"off","analyze":"off","level-1-compact":false,"post-process-at-last":true,"compact":false},"cron":{"switch-mode":"5m0s","log-progress":"5m0s","check-disk-quota":"1m0s"},"routes":,"security":{"ca-path":"","cert-path":"","key-path":"","redact-info-log":false},"black-white-list":{"do-tables":null,"do-dbs":null,"ignore-tables":null,"ignore-dbs":null}}”]
[2023/02/03 22:23:44.124 +08:00] [INFO] [lightning.go:495] [“load data source start”] [task=wps] [unit=lightning-load]
[2023/02/03 22:23:47.002 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:23:47.202 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:23:49.422 +08:00] [INFO] [loader.go:450] [“[loader] file is filtered by file router”] [task=wps] [unit=lightning-load] [path=metadata]
[2023/02/03 22:23:50.055 +08:00] [INFO] [lightning.go:498] [“load data source completed”] [task=wps] [unit=lightning-load] [takeTime=5.930994944s]
[2023/02/03 22:23:50.055 +08:00] [INFO] [checkpoints.go:1014] [“open checkpoint file failed, going to create a new one”] [task=wps] [unit=lightning-load] [path=/data/wps/data.wps/tidb_lightning_checkpoint.pb]
[2023/02/03 22:23:50.100 +08:00] [INFO] [restore.go:461] [“the whole procedure start”] [task=wps] [unit=lightning-load]
[2023/02/03 22:23:50.100 +08:00] [INFO] [restore.go:769] [“restore all schema start”] [task=wps] [unit=lightning-load]
[2023/02/03 22:24:17.000 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:24:17.202 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:24:47.001 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:24:47.201 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:25:17.000 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:25:17.205 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:25:47.002 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:25:47.205 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:26:17.006 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]
[2023/02/03 22:26:17.204 +08:00] [INFO] [relay.go:639] [“ignore event by transformer”] [component=“relay log”] [header=“{"Timestamp":0,"EventType":27,"ServerID":259202690,"EventSize":36,"LogPos":368,"Flags":0}”] [reason=“heartbeat event”]
[2023/02/03 22:26:47.000 +08:00] [INFO] [lightning.go:494] [“progress status of lightning”] [task=wps] [unit=lightning-load] [finished_bytes=0] [total_bytes=0] [progress=“0.00 %”] [“current speed (bytes / seconds)”=0]

| username: wisdom | Original post link

There’s nothing visible here. Please post the TiDB logs. There should be some errors at the TiDB layer.

| username: TiDBer_ShFvkFPA | Original post link

Currently, there are no errors at the TiDB layer. The number of DM tasks established has been able to break through to 140,000 tables, and the number of tables is still increasing. Is there a similar limit on DM in terms of synchronization tasks?

| username: wisdom | Original post link

What you said is too general. Currently, there are restrictions on upstream and downstream versions, syntax, and character sets.
Maximum number of synchronization tasks: 600
Number of tables processed by each Task: Unlimited

| username: liuis | Original post link

In theory, there are no limitations during the TiDB synchronization process. Previously, we have synchronized tables with up to 700,000 rows by running multiple DM tasks. Could it be that insufficient DM resources are causing the issue?