TiDB v6.1.1 Scaling Down TiKV, Other KV Nodes Continuously Print "invalid store xxxx", Cluster Query Abnormal Error "tikvserver busy"

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB v6.1.1 缩容Tikv ,其他kv 节点一直打印invalid store xxxx, 集群查询不正常报错tikvserver busy

| username: kkpeter

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] tidb v6.1.1
[Reproduction Path] What operations were performed to cause the issue
Scaled down a KV node
[Encountered Issue: Problem Phenomenon and Impact]
Logs from other KV nodes kept printing “invalid store xxxx,” where xxxx is the ID of the scaled-down KV node, for more than 7 hours.

SQL latency increased, and “tikv server busy” was shown in backoff.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

We restarted the PD and TiDB nodes, but the error persisted. Finally, we restarted all TiKV nodes, and the issue was resolved.

Additional Information

Our cluster was upgraded from v4.0.14 through a rolling upgrade.

| username: kkpeter | Original post link

| username: WalterWj | Original post link

It shouldn’t be. How did you perform the scale-in operation, and what version of tiup are you using?

| username: CuteRay | Original post link

Do you have the cluster topology information? How many TiKV and TiDB instances are there? Could you please share it for us to take a look?

| username: kkpeter | Original post link

Scaling down is the store delete command. I also think it shouldn’t be.

| username: kkpeter | Original post link

8 kv, 4 tidb

| username: WalterWj | Original post link

Scaling down is not store delete, that’s manual deletion.

Scaling down is tiup cluster scale-in.

| username: kkpeter | Original post link

There’s no difference; in the end, it’s all about executing store delete store-id.

| username: h5n1 | Original post link

First, check the status.

| username: WalterWj | Original post link

After the store is deleted, is the corresponding TiKV node immediately shut down?

| username: kkpeter | Original post link

We have already performed all the operations you mentioned. Executing store 63248086 in PD directly reports an error saying that this store cannot be found.

| username: kkpeter | Original post link

We normally wait until the node status changes to tombstone, then remove the tombstone before stopping the node.

| username: kkpeter | Original post link

It is estimated that there is a problem with some cache. I have encountered TiDB accessing offline KV nodes before, and it was also resolved by restarting in the end.

| username: CuteRay | Original post link

Guess why the official documentation recommends using tiup cluster scale-in for scaling down tikv instead of store delete.

| username: 裤衩儿飞上天 | Original post link

Store delete is quite aggressive; it’s better to use the official documentation’s method for normal scaling.

| username: kkpeter | Original post link

Guess if I guess or not

| username: kkpeter | Original post link

The image you provided is not accessible. Please provide the text content you need translated.

| username: kkpeter | Original post link

Come and see what the official tiup is calling.

| username: kkpeter | Original post link

The image you provided is not accessible. Please provide the text you need translated.

| username: 裤衩儿飞上天 | Original post link

Even if you delete the store, there will still be residual information in tiup. It’s better to use the officially recommended method.