TiDB Lightning Import Error: Scatter Region Failed

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb lightning导入报错scatter region failed

| username: TiDBer_Q6zIfbhF

[TiDB Usage Environment] Production Environment / Test / Poc
[TiDB Version] Version v6.5.2
[Reproduction Path] After exporting a TiDB cluster using dumpling and then importing it into another TiDB cluster using tidb lightning, the log reports an error
[2023/07/27 16:36:01.874 +08:00] [WARN] [localhelper.go:448] [“scatter region failed”] [regionCount=13] [failedCount=2] [error=“region 83640 not found”] [errorVerbose=“region 83640 not found\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).BatchSplitRegions.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:428\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:56\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).BatchSplitRegions\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:420\ngithub.com/pingcap/tidb/br/pkg/lightning/backend/local.(*Backend).SplitAndScatterRegionByRanges.func3\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/lightning/backend/local/localhelper.go:293\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.2.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598”]
[2023/07/27 16:36:01.878 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82615] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFoAAAAAAAAAPgBbTE1ODU4Mzn/MzQ2NgAAAAD7ATE1ODU4Mzkz/zQ2NgAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFoAAAAAAAAAPgBbTE1ODU4Mzn/MzQ2NgAAAAD7ATE1ODU4Mzkz/zQ2NgAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82623] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFqAAAAAAAAAPgBbTE1ODUxNzn/MjcxNQAAAAD7ATE1ODUxNzky/zcxNQAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFqAAAAAAAAAPgBbTE1ODUxNzn/MjcxNQAAAAD7ATE1ODUxNzky/zcxNQAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82635] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFtAAAAAAAAAPgBbTE1NjM0MDH/MjQzNwAAAAD7ATE1NjM0MDEy/zQzNwAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFtAAAAAAAAAPgBbTE1NjM0MDH/MjQzNwAAAAD7ATE1NjM0MDEy/zQzNwAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82631] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFsAAAAAAAAAPgBbTE1NzExMzn/Mzg2NQAAAAD7ATE1NzExMzkz/zg2NQAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFsAAAAAAAAAPgBbTE1NzExMzn/Mzg2NQAAAAD7ATE1NzExMzkz/zg2NQAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82643] [keys=2] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFvAAAAAAAAAPgBbTE1NTE2MDn/NTk2NgAAAAD7ATE1NTE2MDk1/zk2NgAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFvAAAAAAAAAPgBenp6enp6enr/enp6enpkdAD+ATE1NjA0NjI3/zY4NQAAAAAA+gA=”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82599] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFkAAAAAAAAAPgBbTE3MDc2ODX/MzA5NQAAAAD7ATE3MDc2ODUz/zA5NQAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFkAAAAAAAAAPgBbTE3MDc2ODX/MzA5NQAAAAD7ATE3MDc2ODUz/zA5NQAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82611] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFnAAAAAAAAAPgBbTE1OTEwMjn/NjY1NgAAAAD7ATE1OTEwMjk2/zY1NgAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFnAAAAAAAAAPgBbTE1OTEwMjn/NjY1NgAAAAD7ATE1OTEwMjk2/zY1NgAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [WARN] [localhelper.go:448] [“scatter region failed”] [regionCount=1] [failedCount=1] [error=“rpc error: code = Unknown desc = region 83680 is not fully replicated”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82627] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFrAAAAAAAAAPgBbTE1NzIwMzj/NDkxOAAAAAD7ATE1NzIwMzg0/zkxOAAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFrAAAAAAAAAPgBbTE1NzIwMzj/NDkxOAAAAAD7ATE1NzIwMzg0/zkxOAAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82603] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFlAAAAAAAAAPgBbTE1OTk4ODj/NTI5OQAAAAD7ATE1OTk4ODg1/zI5OQAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFlAAAAAAAAAPgBbTE1OTk4ODj/NTI5OQAAAAD7ATE1OTk4ODg1/zI5OQAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82639] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFuAAAAAAAAAPgBbTE1NTU2NjD/MDUyMQAAAAD7ATE1NTU2NjAw/zUyMQAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFuAAAAAAAAAPgBbTE1NTU2NjD/MDUyMQAAAAD7ATE1NTU2NjAw/zUyMQAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82607] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFmAAAAAAAAAPgBbTE1OTMxNjP/NTY5MgAAAAD7ATE1OTMxNjM1/zY5MgAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFmAAAAAAAAAPgBbTE1OTMxNjP/NTY5MgAAAAD7ATE1OTMxNjM1/zY5MgAAAAAA+g==”]
[2023/07/27 16:36:01.879 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82595] [keys=1] [firstKey=“dIAAAAAAAAC2X2mAAAAAAAAAAgFjAAAAAAAAAPgBbTE3NjAwMzH/NDk1MQAAAAD7ATE3NjAwMzE0/zk1MQAAAAAA+g==”] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFjAAAAAAAAAPgBbTE3NjAwMzH/NDk1MQAAAAD7ATE3NjAwMzE0/zk1MQAAAAAA+g==”]
[2023/07/27 16:36:01.886 +08:00] [INFO] [localhelper.go:317] [“batch split region”] [region_id=82587] [keys=13] [firstKey=dIAAAAAAAAC2X2mAAAAAAAAAAQOAAAAAAAAAAQExNTAyOTIxNv85MTcAAAAAAPoBaQAAAAAAAAD4] [end=“dIAAAAAAAAC2X2mAAAAAAAAAAgFiAAAAAAAAAPgBbTE3ODU4OTD/MDkxOQAAAAD7ATE3ODU4OTAw/zkxOQAAAAAA+g==”]
[2023/07/27 16:36:01.900 +08:00] [WARN] [localhelper.go:448] [“scatter region failed”] [regionCount=1] [failedCount=1] [error=“rpc error: code = Unknown desc = region 83680 is not fully replicated”]
[2023/07/27 16:36:01.941 +08:00] [WARN] [localhelper.go:448] [“scatter region failed”] [regionCount=1] [failedCount=1] [error=“rpc error: code = Unknown desc = region 83680 is not fully replicated”]
[2023/07/27 16:36:02.022 +08:00] [WARN] [localhelper.go:448] [“scatter region failed”] [regionCount=1] [failedCount=1] [error=“rpc error: code = Unknown desc = region 83680 is not fully replicated”]
[2023/07/27 16:36:02.183 +08:00] [WARN] [localhelper.go:448] [“scatter region failed”] [regionCount=1] [failedCount=1] [error=“rpc error: code = Unknown desc = region 83680 is not fully replicated”]
[2023/07/27 16:36:02.505 +08:00] [WARN] [localhelper.go:448] [“scatter region failed”] [regionCount=1] [failedCount=1] [error=“rpc error: code = Unknown desc = region 83680 is not fully replicated”]

My tidb lightning configuration is as follows
[lightning]
status-addr = ‘:8289’
level = “info”
file = “/home/tidb/tidb-lightning/tidb-lightning.log”
check-requirements = true
region-concurrency = 32

[checkpoint]
enable = true
schema = “tidb_lightning_checkpoint”
driver = “file”
dsn = “/data1/tidb-lightning/tidb_lightning_checkpoint.pb”

[tikv-importer]
disk-quota = “10GB”
backend = “local”
on-duplicate = “error”
sorted-kv-dir = “/data1/tidb-lightning/some-dir”
duplicate-resolution = ‘remove’

[mydumper]
data-source-dir = “/home/tidb/tmp/onlinedata”

filter = [‘.’, ‘!mysql.', '!sys.’, ‘!INFORMATION_SCHEMA.', '!PERFORMANCE_SCHEMA.’, ‘!METRICS_SCHEMA.', '!INSPECTION_SCHEMA.’]
[tidb]
host = “192.168.1.1”
port = 4000
user = “root”
password = “rootroot”
status-port = 10080

pd-addr = “192.168.1.1:2379”
log-level = “error”

The data source is about 240G. Now [2023/07/27 16:48:08.951 +08:00] [WARN] [localhelper.go:448] [“scatter region failed”] [regionCount=1] [failedCount=1] [error=“rpc error: code = Unknown desc = region 83680 is not fully replicated”] this log keeps appearing. Will it have any impact, such as data loss?

| username: TiDBer_Q6zIfbhF | Original post link

I saw this using pd ctl and don’t understand what it means. The documentation says * extra-peer: multi-replica Region. But the cluster was just initialized, nothing was configured, it should all be 3 replicas. What is this multi-replica?

» region check extra-peer
{
“count”: 1,
“regions”: [
{
“id”: 83680,
“start_key”: “7480000000000000FFB65F698000000000FF0000020169000000FF00000000F8016D31FF333831303035FF33FF33373600000000FBFF0131333831303035FF33FF333736000000FF0000FA0000000000FA”,
“end_key”: “7480000000000000FFB65F698000000000FF0000020169000000FF00000000F8016D31FF353834373538FF35FF33383100000000FBFF0131353834373538FF35FF333831000000FF0000FA0000000000FA”,
“epoch”: {
“conf_ver”: 42,
“version”: 116
},
“peers”: [
{
“id”: 83681,
“store_id”: 5,
“role_name”: “Voter”
},
{
“id”: 83682,
“store_id”: 6,
“role_name”: “Voter”
},
{
“id”: 83683,
“store_id”: 2,
“role_name”: “Voter”
},
{
“id”: 83684,
“store_id”: 7,
“role”: 1,
“role_name”: “Learner”,
“is_learner”: true
}
],
“leader”: {
“id”: 83681,
“store_id”: 5,
“role_name”: “Voter”
},
“cpu_usage”: 0,
“written_bytes”: 1921387,
“read_bytes”: 0,
“written_keys”: 2365,
“read_keys”: 0,
“approximate_size”: 549,
“approximate_keys”: 3655815
}
]
}

| username: tidb菜鸟一只 | Original post link

Normally, if there are 3 replicas, and a region has 4 or more replicas, this region is considered an abnormal region with extra replicas.

| username: TiDBer_Q6zIfbhF | Original post link

How should this be resolved? The cluster has 5 TiKV nodes and has just been initialized.

| username: 有猫万事足 | Original post link

WARN logs don’t matter. They won’t cause data loss.
During import, there are many data changes. Errors like region miss or leader not found are very common.
If it’s an error log, Lightning will stop by itself.

Since your cluster is newly initialized, you can import boldly without worry.
If there are any issues, rebuilding is not troublesome. This is the time when the cost of issues is the lowest.

| username: tidb菜鸟一只 | Original post link

Multiple replicas are not the same as fewer replicas; importing data does not affect it. Once the import is complete, the extra replicas should be automatically cleaned up after a certain period. If you want to manually clean them up, you can use pdctl to execute the following command to manually remove the Learner peer:
operator add remove-peer 83684 7

| username: caiyfc | Original post link

As long as Lightning hasn’t exited automatically, the issue shouldn’t be significant. After the entire process is completed, check the Lightning logs. It will output some import information at the end. Generally, if the log ends with “the whole procedure completed,” it indicates a successful import. If you see errors at the end of the log, there might indeed be an issue with the import.

| username: h5n1 | Original post link

Try using pd-ctl operator add remove-peer 83680 7 to see if you can remove the peer with the Learner role.