TiKV Data Distribution is Uneven (Leaders are Balanced, but Total Number of Regions Varies)

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiKV数据分布不均匀(Leader是平均的,但是总region数量不一样)

| username: dba-kit

As mentioned, several TiKV nodes in the cluster have uneven data distribution (the Leaders are balanced, but the total number of regions is different). However, the store scores are consistent, and the isolation-level is set to host, but it seems to be isolating at the zone level. The relevant parameters are as follows:


Through Grafana monitoring, it is clearer that each region has 3T of data, but the Leader data volume is consistent.

| username: xingzhenxiang | Original post link

I checked mine, there are 782G and 934G.

| username: h5n1 | Original post link

Non-strict matching should be matching labels that satisfy the number of replicas from top to bottom. With node counts of 2, 2, and 3, it’s not possible to achieve host-level balancing.

| username: 裤衩儿飞上天 | Original post link

How do you set labels for TiKV?

| username: dba-kit | Original post link

Similar to this, it is divided into four levels: zone, dc, host, and disk.

| username: dba-kit | Original post link

Uh, does it mean that if I set strictly-match-label=true, it will actually enforce host isolation?

| username: h5n1 | Original post link

You can set up a 1 1 2 environment to test it, and then check if there are any anomalies in the region health monitoring.

| username: dba-kit | Original post link

According to the documentation, it indeed prioritizes distribution based on the hierarchy of labels. Assuming the number of Leaders on a single node is x, the total number of regions in 2-node DCs is (2x + 2x + 3x), averaging 7/2x per node. In a 3-node DC, the number of regions per single node is 7/3x. In practice, the number of Regions in production also follows this relationship. A single node Leader has 37.8k, nodes in D/E zones have around 130k, and a single node in the F zone has 88k, exactly matching the calculated values.

| username: dba-kit | Original post link

However, according to this description, setting the strictly-match-label parameter cannot solve this problem. strictly-match-label is a fallback setting and will still prioritize deploying follower replicas according to the label level.

| username: dba-kit | Original post link

It seems that you can consider using replication.location-labels and change it from zone,dc,host,disk to host. This way, it should only consider the value of the host label for even distribution. However, this would compromise the cross-data center disaster recovery strategy. I think I’ll apply for two more machines to expand the capacity. :thinking:

| username: xingzhenxiang | Original post link

I saw in the official documentation that once the condition is met, the subsequent ones are not considered.

| username: dba-kit | Original post link

This is most likely also caused by uneven machine distribution, but the difference is not significant.

| username: dba-kit | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.