Uneven Data Distribution Across TiKV Nodes

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv各节点数据量不均匀分布

| username: myquzuo

[TiDB Usage Environment] Production Environment
[TiDB Version] V7.1.0
[Encountered Problem: Problem Phenomenon and Impact]
Data is unevenly distributed across each node
[Resource Configuration]
image
image

| username: 像风一样的男子 | Original post link

Check in the monitoring to see if the region scheduling tasks are normal?

| username: zhaokede | Original post link

Differences in Node Configuration: Different TiKV nodes may have varying hardware configurations (such as disk capacity, CPU, memory, etc.), leading to imbalances in data processing and storage capabilities.
Load Balancing Strategy: TiDB’s load balancing strategy may not fully adapt to the current load conditions, causing some nodes to bear excessive data storage and processing tasks.
Data Migration Issues: During node addition, removal, or fault recovery, data migration may not proceed as expected, resulting in uneven data distribution.
Isolation Level Settings: The isolation level settings of TiKV (such as zone, host, etc.) may affect data distribution and migration.

| username: 小龙虾爱大龙虾 | Original post link

You need to check the PD panel in Grafana for uneven distribution.
Additionally, your cluster topology is quite unusual. Here are a few suggestions for you:

  1. PD nodes should be in odd numbers, recommended to have 3.
  2. If you are using Tiflash nodes, deploy them separately. Do not mix them with other components as it can affect the normal operation of other components. If you do not need Tiflash, you can choose not to deploy it.
| username: myquzuo | Original post link

The high disk usage is caused by the TiDB log directory being too large.

| username: 我是吉米哥 | Original post link

For logs, you can set up automatic log rotation.

| username: TIDB-Learner | Original post link

Mixed deployment, the disk size alone doesn’t necessarily indicate uneven distribution. Check the log sizes to see how they are.

| username: 小于同学 | Original post link

Are the region tasks normal?

| username: 这里介绍不了我 | Original post link

Check in Grafana to see if the number of regions is not significantly different.

| username: tidb菜鸟一只 | Original post link

Let’s kill one PD, deploy TiFlash separately… we don’t need that many TiDB servers either.