[TiDB Usage Environment] Production Environment
[TiDB Version] v6.1.0
[Encountered Problem: Problem Phenomenon and Impact]
I want to upgrade the TiDB version to the latest version, but when checking the cluster, the cluster status reports an error.
Checking region status of the cluster loantidb…
Regions are not fully healthy: 14 pending-peer
Please fix unhealthy regions before other operations.
I have provided the pd-ctl region check pending-peer and monitoring screenshots.
How can I eliminate the pending-peer status? Thank you, everyone.
Restarting TIKV still has the same issue.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Has the cluster ever used Lightning or modified any TiKV compact-related configurations? Also, is the cluster under heavy load? If everything is generally normal, there should rarely be any pending issues. If something is pending for a long time, it should eventually result in a down peer.
There was no lighting or modification done, just frequent batch deletions by the development side using the batch delete command. That should be the cause, but it’s unclear how long this pending-peer will take to disappear.
Moreover, the content I get after executing region check pending-peer for a period of time is always different, but the number of pending peers is always 14. I don’t know what is happening inside. The cluster only imports data at scheduled times every day, and it is idle most of the time.
You can check the monitoring, for example, look at the past 3 days, and see if there are any time periods where down and pending peers disappear. If there are, choose that time period to upgrade.
Is there any special configuration for the current TiKV? You can also send it over for us to take a look.
Or manually perform a TiKV compact on the cluster: TiKV Control 使用说明 | PingCAP 文档中心
You can see if this solves the issue, but it needs to be done during non-business hours.
The data import process involves a Java project developed to fetch data files from other sources. The data is then cleaned within the project and a CSV file is generated. This process is executed daily, producing a CSV file of approximately 300MB each time. The Java project reads the CSV file, generates SQL statements, and then inserts the data into TiDB.
I took a look and saw that many TiKV configurations were adjusted. Generally, it is not recommended to adjust raft and rocksdb related configurations by yourself. Sigh.
Compact cannot be turned off. Compact is the reorganization and compression of underlying data files. If you turn it off, even garbage collection (GC) cannot be performed.
I changed the two parameters you mentioned:
raftstore.split-region-check-tick-interval: 30s
rocksdb.defaultcf.disable-auto-compactions: false
After restarting TiDB, everything else is still displayed on the monitoring interface, but the pending-peer and other metrics are not showing:
Starting component ctl: /root/.tiup/components/ctl/v6.1.0/ctl pd -i -u http://127.0.0.1:2379
» region check pending-peer
{
“count”: 0,
“regions”:
}
Finally, I checked the cluster:
Checking region status of the cluster loantidb…
All regions are healthy.
Thank you very much.