TiKV Disk Storage Load Balancing Issue

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV磁盘存储负载均衡问题

| username: TiDBer_bOR8eMEn

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.2.3
[Reproduction Path] Three machines with the same configuration, uneven data distribution
[Encountered Problem: Phenomenon and Impact]
[Resource Configuration]


There is a significant difference in size between the disks. Is there any way to make the disk usage even across the three machines? Seeking guidance from experts.

| username: TiDBer_jYQINSnf | Original post link

With only 3 machines, it’s still not balanced. Are the number of regions the same? Are all the schedulers present? Use pd-ctl scheduler show to check.

| username: 小龙虾爱大龙虾 | Original post link

Take a look at the PD balance-related panel to see if the regions are balanced.

| username: TiDBer_jYQINSnf | Original post link

I see that the available space is unbalanced, but the occupied space is consistent, right? The small amount on the left is the occupied space, right? It looks like 2.8T, 2.6T, and 2.2T are relatively normal. I guess the disk sizes of your three machines are not the same. The last one is almost full, but it only occupies 2.2T.

| username: TiDBer_bOR8eMEn | Original post link

The regions look similar.

| username: TiDBer_bOR8eMEn | Original post link

The disk sizes are consistent.

| username: TiDBer_bOR8eMEn | Original post link

The space occupation is not very balanced, it’s the panel on the left side.

| username: tidb菜鸟一只 | Original post link

Look at these two, store region size and count.

| username: TiDBer_bOR8eMEn | Original post link

Sorry, I can’t translate images. Please provide the text you need translated.

| username: tidb菜鸟一只 | Original post link

The size displayed here is different from what you mentioned above. Can you check the directory usage on the host using df -h? You can also use tiup ctl:v5.4.3 pd -u http://10.10.10.14:2379 -i to enter pdctl and then execute store to check the storage usage of the three TiKV nodes.

| username: TiDBer_bOR8eMEn | Original post link

After running df -h on the three machines, the available size is the same as in the first screenshot above.

| username: tidb菜鸟一只 | Original post link

Use tiup ctl:v5.4.3 pd -u http://10.10.10.14:2379/ -i to enter pdctl. After entering, execute store to check the storage usage of the 3 TiKV nodes.

By the way, check if region_weight has been adjusted.

| username: TiDBer_jYQINSnf | Original post link

The discrepancy in occupied space is because you only have about 300GB of disk space left at the bottom. Try to avoid writing to it as much as possible.

| username: TiDBer_jYQINSnf | Original post link

The area marked in yellow is relatively balanced. The imbalance in the red box is because the available space on the right side is only 384G, so PD considers that the disk on this machine is almost full and moves the regions to other machines.

| username: 裤衩儿飞上天 | Original post link

You have only 3 TiKV nodes in this cluster, so you’re using the default 3 replicas, right?
I think you should check the nodes with high disk usage. Are there a lot of logs on the disk? Or is there other data stored? You can clean it up appropriately.

| username: dba-kit | Original post link

As mentioned above, regardless of the available space, first confirm whether the space occupied by the data under the “Overview-TiKV-store size” panel is the same. If it is the same, it indicates that the space might be occupied by other components.

| username: TiDBer_bOR8eMEn | Original post link

I am using the default settings. I checked the three machines, and the size of the directory /export/tidb-data/tikv-20160 is the main one with a significant difference in data.


These files are the main ones.

| username: TiDBer_jYQINSnf | Original post link

Just delete rm rocksdb.info.* -rf. It’s useless, just logs.

| username: TiDBer_bOR8eMEn | Original post link

Huh? Can it be deleted?

| username: TiDBer_bOR8eMEn | Original post link

I checked the size occupied by the three machines /export/tidb-data/tikv-20160


Someone downstairs said that rocksdb.info.* type files can be deleted? Isn’t every file under tidb-data very important?