Uneven Distribution of Regions

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: region分布不均衡

| username: 胡杨树旁

The scoring and distribution of leaders are almost uniform, but the distribution of regions is uneven. The difference between the largest and smallest stores can reach around 200G. It seems that the scheduling of regions has not taken effect.



| username: wangccsy | Original post link

I have only experimented with one Region.

| username: 路在何chu | Original post link

Check the region health and look at the empty regions and such.

| username: 胡杨树旁 | Original post link

Empty region data is not considered abnormal.

| username: dba远航 | Original post link

That means the REGION data is unbalanced.

| username: FutureDB | Original post link

How much data is stored on average per store?

| username: xfworld | Original post link

The score differences might be quite large…

  1. You need to check if the resources of all nodes are consistent.
  2. The scoring depends on these basic resources, and then on the configuration information.

I suggest you do some checks.

| username: dba-kit | Original post link

You can check if it is similar to this post, TiKV is unevenly distributed at the replication.location-labels level.

| username: 胡杨树旁 | Original post link

The configuration of all 6 servers is the same.

| username: 胡杨树旁 | Original post link

There is a slight difference. I see that the original poster has 7 TiKV nodes, distributed as 2, 2, 3. We have deployed 3 servers with 6 TiKV nodes, with a replication factor of 3, distributed as 2, 2, 2.

| username: 小龙虾爱大龙虾 | Original post link

Did you configure labels and placement rules?

| username: 胡杨树旁 | Original post link

There is a label configured


placement rule is not configured

| username: 胡杨树旁 | Original post link

Looking at this peer, it keeps increasing, but we have dumps every day, so it should normally be decreasing.

| username: 小龙虾爱大龙虾 | Original post link

Are the data disk sizes of each TiKV node the same? What is the current usage rate? Check the host interface on the dashboard.

| username: 胡杨树旁 | Original post link

The default value of tidb_gc_life_time is 10m, which means that the data deleted within 10 minutes can be recovered. If you want to recover data from a longer time ago, you need to adjust this parameter before deleting the data.

| username: 小龙虾爱大龙虾 | Original post link

What is the usage rate? It looks a bit high. Check the monitoring PD panel.

| username: 胡杨树旁 | Original post link

The highest usage rate is 89% and the lowest is 68%.

| username: h5n1 | Original post link

Check region_weight and leader_weight in information_schema.tikv_store_status.

| username: 小龙虾爱大龙虾 | Original post link

Take another look at the PD => Scheduler => Balance Region scheduler panel.

| username: 胡杨树旁 | Original post link

I think the problem is that the tidb-server process is not running. You can check if the process is running using the ps command. If it is not running, you can start it using the systemctl start tidb command.