Does importing data with tidb-lightning result in fewer regions?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb-lightning导数据越导region越少?

| username: starCrush

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 5.4.3 tidb-lightning 6.5.1
[Reproduction Path] What operations were performed to cause the issue
[Encountered Issue: Problem Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Using tidb-lightning to import approximately 10T of data from MySQL to TiDB in local mode. It has been running for about 10 days now. According to the lightning logs, many checks are failing. The PD leader logs are continuously showing “change leader” and “drop leader,” causing the number of regions in the entire TiDB cluster to decrease.

lightning logs:

pd logs:

TIDB logs:

Since the 23rd, the progress has been stuck with the above checksum failures:

The entire TiDB cluster display shows no KV nodes are down, and everything appears normal.

| username: tidb菜鸟一只 | Original post link

For 10T of data, you might want to try using DM. It’s not recommended to use logical replication for more than 1T.

| username: xingzhenxiang | Original post link

This phenomenon is normal. Initially, there are many tables being imported in parallel, so multiple tables can be operated on simultaneously, making the import speed appear faster. Towards the end, there are only one or two large tables left, so the import speed seems to slow down. Check if the remaining number of tables is less than the current import concurrency setting.

| username: starCrush | Original post link

Isn’t DM just a SQL logical import method? For large data volumes, isn’t it recommended to use Lightning? There are still about 60T of databases waiting to be migrated, and the process for the current 10T hasn’t been completed yet.

| username: huhaifeng | Original post link

Regarding the reduction of regions, it might be because of this: initially, Lightning will pre-create many empty regions and set them not to merge within an hour; however, the import time is too long, causing some empty regions to merge. You can check the PD monitoring, and you should be able to see the merge operations.

“What does ‘checksum failure has been occurring since the 23rd after reaching this progress’ mean? The logs show no issues with the checksum.” Also, confirm that the relevant tables and schema data in the TiDB cluster were empty before importing the data, right?

Lastly, it is recommended to import in batches, in small amounts multiple times: for example, 1TB at a time, and then proceed to the next after importing; additionally, multiple instances of Lightning can run simultaneously as well.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.