Lightning Import is Very Slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: lightning导入很慢

| username: zhanggame1

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] 7.1
[Encountered Problem: Problem Phenomenon and Impact]
Today during testing, the test cluster with gigabit network can run at full capacity, and the dumpling speed is decent. The backup speed for 100G data is also acceptable. However, the tidb-lightning import is very slow. The dumpling file being tested is only 100M, but it took a long time and still didn’t complete the import. The CPU load on the import machine is very low. The configuration file is as follows:

[lightning]
# Log
level = "info"
file = "tidb-lightning.log"

[tikv-importer]
# Choose the import mode
backend = "local"
# Set the temporary storage location for sorted key-value pairs, the target path needs to be an empty directory
sorted-kv-dir = "/tmp/sorted-kv-dir"

[mydumper]
# Source data directory.
data-source-dir = "/tmp/dumpling"

# Configure wildcard rules, the default rule will filter out all tables under the system databases mysql, sys, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA, METRICS_SCHEMA, INSPECTION_SCHEMA
# If this item is not configured, an "unable to find schema" exception will occur when importing system tables
filter = ['*.*', '!mysql.*', '!sys.*', '!INFORMATION_SCHEMA.*', '!PERFORMANCE_SCHEMA.*', '!METRICS_SCHEMA.*', '!INSPECTION_SCHEMA.*']

[tidb]
# Information of the target cluster
host = "10.10.10.1"
port = 4000
user = "root"
password = "3232323"
# Table schema information is obtained from the "status port" of TiDB.
status-port = 10080
# Address of the cluster PD
pd-addr = "10.10.10.1:2379"

[Attachments: Screenshots/Logs/Monitoring]


| username: caiyfc | Original post link

I remember that if not set, Lightning will use up all machine resources by default. You can check if the IO is full. If there is only one data file, Lightning might be running in single-threaded mode.

| username: 有猫万事足 | Original post link

It seems to be stuck while analyzing the test_vegas2.ticket table.

| username: tidb菜鸟一只 | Original post link

This is a hint, there are quite a few empty regions. It is recommended to merge the empty regions first and then try again.

| username: zhanggame1 | Original post link

Is there any way to speed up the merge?