Issue of Inconsistent Region Count After Expanding One Node in a 3-Replica 3-TiKV Setup

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 3副本3个tikv扩容一个节点后region_count数不一致问题

| username: TiDBer_Y2d2kiJh

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] v5.4.0 2tidb 3pd 3tikv 2ha
[Reproduction Path] Due to an IO failure, a tikv was expanded on August 16 and then a tikv was reduced. Both the leader and region_count were balanced. After expanding a tikv on September 8, the leader was balanced, but the region_count was balanced in pairs. The two old tikvs were balanced, and the newly added region_count was balanced, as shown in the figure:
[Encountered Problem: Problem Phenomenon and Impact]
[Resource Configuration] Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]


| username: zhanggame1 | Original post link

Are all hard drive capacities the same?

| username: TiDBer_Y2d2kiJh | Original post link

The newly added storage is 1.5T, and the original storage space of the 3 TiKV nodes is around 1000G.

| username: zhanggame1 | Original post link

Theoretically, the number of regions on TiKV is related to the disk capacity. The larger the disk, the more regions there will be.

| username: h5n1 | Original post link

Check the file system utilization rate.

| username: TiDBer_Y2d2kiJh | Original post link

The TiKV data disk usage added on September 8 is 7%, the TiKV data disk usage added on August 16 is 14%, and the usage of the old two TiKV data disks is 22% and 25%.

| username: Kongdom | Original post link

Take a look at the scores of each node? The distribution should be based on the scores.

| username: TiDBer_Y2d2kiJh | Original post link

How can I view the scores of each node?

| username: TiDBer_Y2d2kiJh | Original post link

I tested this issue by adding a TiKV node to a 3-node, 3-replica cluster. After adding the node, both the leader and region counts were 0, and no data balancing occurred. When I scaled down one of the original three TiKV nodes, data started to balance to the new node, specifically balancing the data from the scaled-down node to the newly added node. Therefore, any TiKV nodes added later will balance the data from the scaled-down node. When multiple nodes are added, they will evenly distribute the data from the last scaled-down node, resulting in an imbalance with the original data.

| username: 有猫万事足 | Original post link

The version is a bit outdated.

In the end, it all comes down to balance-region-scheduler and balance-leader-scheduler.

It seems that PD in version 5.4 still cannot configure these two schedulers. So, there isn’t a good way to control them. Even pdctl can’t do it manually.

| username: Hacker007 | Original post link

I’m not sure which version and server differences you’re referring to. In my case, after scaling out, the number of nodes is consistent. What you’re experiencing seems a bit strange.