There are a large number of empty regions, and the KV data distribution is uneven

translator_bot · June 23, 2024, 9:13am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 有大量Empty region，且kv数据分布不均匀

| username: TiDBer_HIsGTwuV

【TiDB Usage Environment】
Production environment
【TiDB Version】
v4.0.12
【Encountered Problem】
During routine maintenance, it was found that the disk usage of some KV nodes was extremely high, and a large number of Empty regions appeared in the monitoring. According to the documentation and previous posts, operations such as Store Limit, enable-cross-table-merge, max-merge-region-keys, max-merge-region-size, and region-schedule-limit were performed, but to no avail.

【Reproduction Path】
【Problem Phenomenon and Impact】

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.
hengpu-cluster-PD-1658385668172.json (217.4 KB) hengpu-cluster-PD_2022-07-21T06_58_41.869Z.json (16.0 MB)

translator_bot · June 23, 2024, 9:13am

| username: 长安是只喵 | Original post link

Has cross-table region merging been enabled?

`enable-cross-table-merge`

Sets whether to enable cross-table merge.
Default value: true

translator_bot · June 23, 2024, 9:13am

| username: TiDBer_HIsGTwuV | Original post link

It’s open, and it’s in the picture above.

translator_bot · June 23, 2024, 9:13am

| username: 长安是只喵 | Original post link

There is also this parameter that needs to be adjusted, I encountered it at that time as well.

`enable-cross-table-merge` Default `false`. If set to true, it indicates that two regions from different tables can be merged. This option is only effective when the `key type` is “table”.

TiDB 的问答社区 – 16 May 20

[FAQ] 解决已经开启 region merge， empty-region-count 仍然很多问题

🌌 运维指南 TiDB 常见 FAQ

【问题澄清】检查监控，发现 [overview - pd - region health[empty-region-count]] 空 region 达到了 559K，但是已经开启了 region merge，空 region 也没有明显的下降，这是为何？【知识点引入】参数解释： pd 参数解释： < v3.1 使用 namespace-classifier 默认是 table，默认情况下，region merge 不会进行表表合并，所以在集群中有大量的...

阅读时间: 1 mins 🕑 赞: 4 ❤

translator_bot · June 23, 2024, 9:13am

| username: ddhe9527 | Original post link

Try reducing the patrol-region-interval as well.

translator_bot · June 23, 2024, 9:13am

| username: TiDBer_HIsGTwuV | Original post link

The key-type in TiDB should default to table, this has never been changed.

translator_bot · June 23, 2024, 9:13am

| username: TiDBer_HIsGTwuV | Original post link

This has already been changed from 100ms to 10ms, but it still has no effect.

translator_bot · June 23, 2024, 9:13am

| username: foxchan | Original post link

Check the TiKV parameter
split-region-on-table to see if it is set to false,
restart TiKV.
If conditions permit, you can evict a TiKV leader to 0, and then restart TiKV.

translator_bot · June 23, 2024, 9:13am

| username: TiDBer_HIsGTwuV | Original post link

By using show config, the online query shows that coprocessor.split-region-on-table is false.

translator_bot · June 23, 2024, 9:13am

| username: 长安是只喵 | Original post link

I also tried the other parameters in the FAQ I posted. At that time, I also cleared the empty region according to the end of this FAQ.

translator_bot · June 23, 2024, 9:13am

| username: songxuecheng | Original post link

Is the high usage of some disks reaching the thresholds of these two parameters?

If so, adjust these two parameters.
It seems that your parameters indicate that there will be issues when the disk usage reaches 60%.

translator_bot · June 23, 2024, 9:13am

| username: TiDBer_HIsGTwuV | Original post link

There is a high disk usage rate on some nodes, but not on others, so it is currently suspected that empty regions are causing this. If this parameter is modified, leading to issues with node score calculation, could it potentially cause an avalanche?

translator_bot · June 23, 2024, 9:13am

| username: ddhe9527 | Original post link

A temporary solution is to manually create some operators to merge regions.

echo 'tiup ctl:v4.0.12 pd operator add merge-region '`tiup ctl:v4.0.12 pd region check empty-region --pd 10.1.48.44:2379 | jq '.regions[].id' | tail -1000 | xargs -l2`

translator_bot · June 23, 2024, 9:13am

| username: TiDBer_HIsGTwuV | Original post link

This parameter should be set to false by default after version 4.0. It is not configured in the configuration file, and querying it shows that it is already set to false. It still doesn’t work.

translator_bot · June 23, 2024, 9:13am

| username: songxuecheng | Original post link

You can add new nodes, adjust the parameters, and wait for the cluster to automatically balance. This is what I encountered before. An avalanche shouldn’t happen; just increase the threshold a bit.

translator_bot · June 23, 2024, 9:13am

| username: 长安是只喵 | Original post link

It is still recommended to try reloading as mentioned in the FAQ.

translator_bot · June 23, 2024, 9:13am

| username: TiDBer_jYQINSnf | Original post link

I see you’ve tried all the operations mentioned above, but they didn’t work, right? Check the PD monitoring on Grafana to see if any region operators have been generated. I’ve encountered a similar issue before; adjusting the store limit and shortening the merge time didn’t help. In the end, I switched the PD leader, and the merge started working smoothly. You can give it a try . It might be a quirky trick, but since you’ve tried everything else, you might as well give it a shot.

Here are the commands to switch the leader:

pd-ctl -i
member leader show
# Resign the leader from the current member:

>> member leader resign
# Transfer the leader to a specified member:

>> member leader transfer pd3

translator_bot · June 23, 2024, 9:13am

| username: tidb狂热爱好者 | Original post link

Tech expert, awesome!

translator_bot · June 23, 2024, 9:13am

| username: h5n1 | Original post link

Empty regions are generated after GC processes drop/truncate objects. The trend for empty regions is decreasing, but the region merging is relatively slow. You can try increasing the parameters: region-schedule-limit: 2048, merge-schedule-limit=64.

translator_bot · June 23, 2024, 9:13am

| username: TiDBer_HIsGTwuV | Original post link

Switching the leader didn’t work either…

There are a large number of empty regions, and the KV data distribution is uneven

enable-cross-table-merge

enable-cross-table-merge Default false. If set to true, it indicates that two regions from different tables can be merged. This option is only effective when the key type is “table”.

`enable-cross-table-merge`

`enable-cross-table-merge` Default `false`. If set to true, it indicates that two regions from different tables can be merged. This option is only effective when the `key type` is “table”.