After forcibly scaling down and then scaling up a 3-node TiKV cluster with one node failure, the node will not balance data

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 3个节点tikv,一个节点故障后强制缩容后再扩容,该节点不会平衡数据

| username: du拉松

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.1.2
[Reproduction Path]

  1. A node’s disk had an issue. After recovering the disk and restarting the node, continuous errors occurred. After scaling in with --force and then scaling out, the node did not synchronize replica data from other nodes.
    [Encountered Issue: Symptoms and Impact]
    After scaling out, the data did not balance, meaning there are now only 2 replicas. It seems the operator is not functioning correctly.
    [Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

[Attachments: Screenshots/Logs/Monitoring]



| username: Fly-bird | Original post link

Yes, it will automatically balance the data. Let’s wait and see.

| username: TIDB-Learner | Original post link

It should be.

| username: du拉松 | Original post link

It has been almost 12 hours, and there is basically no balanced data. Mainly, one of the three replicas is missing, and it seems that it cannot automatically replenish the missing replica.

| username: Kongdom | Original post link

Are the configurations of the three nodes’ servers the same? It looks like the scores are different, leading to uneven distribution.

| username: porpoiselxj | Original post link

Based on this, check the scoring of each node:

| username: du拉松 | Original post link

The disk on this 4.97 server is somewhat smaller. This is how I understand it: there are currently only 3 KV nodes, with 3 replicas distributed across different KVs. So even if there are some configuration differences, it shouldn’t affect the number of replicas, and the data should be the same, right?

| username: Kongdom | Original post link

Your understanding is correct; the regions are balanced now. However, the leaders are definitely influenced by the configuration. Generally, nodes with higher configurations will have more leaders and handle more read and write tasks.

| username: du拉松 | Original post link

Yes, the number of replicas can be determined from the number of regions. However, why is there such a big difference in storage usage even though the number of replicas is the same?

| username: Kongdom | Original post link

A region is a left-closed, right-open interval, and not every region is of the same size.

| username: du拉松 | Original post link

So, the number of regions cannot confirm whether the replica data has been fully synchronized, right? Then, can it confirm that the node is not missing any data replicas? Could you please tell me how to ensure that the three nodes have the same data replicas?

| username: 路在何chu | Original post link

I have also done such an experiment, not sure what’s going on.

| username: Kongdom | Original post link

:thinking: The three nodes are not identical data replicas, right?
As I understand it, for example, data is divided into three parts (normally close to evenly), each part of the data has 1 primary replica and 2 secondary replicas, and the data is exactly the same.
Primary replica L1, secondary replicas F11, F12
Primary replica L2, secondary replicas F21, F22
Primary replica L3, secondary replicas F31, F32
The general data distribution is as follows:
Node 1: L1 F21 F31
Node 2: L2 F11 F32
Node 3: L3 F12 F22
With this distribution, if any node goes down, the primary replica can be quickly restored from the secondary replicas on the other two nodes, ensuring data accuracy. Only the primary replicas, i.e., L1, L2, and L3, provide external services.

| username: 小龙虾爱大龙虾 | Original post link

Check the region health in the PD panel.
From the provided monitoring, you have a total of 24 leaders, with 24 regions per node, which is normal. However, there is a significant difference in space usage.

| username: du拉松 | Original post link

The current health status of the region is as follows, and it doesn’t seem to have any issues.

| username: 小龙虾爱大龙虾 | Original post link

Take a look at the store use expression, from which metric the data is obtained, and send the region size of the PD=>balance panel.

| username: 江湖故人 | Original post link

What is the leader-schedule-limit?
Have you set the placement policy and store weight?

| username: 江湖故人 | Original post link

Experts, will empty regions also be balanced by quantity?

| username: FutureDB | Original post link

I understand that we generally look at whether the number of region leaders on the nodes is balanced.

| username: du拉松 | Original post link

Here is the expression for store use:


Here is the region size: