Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiDB v6.1.1 缩容Tikv ,其他kv 节点一直打印invalid store xxxx, 集群查询不正常报错tikvserver busy
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] tidb v6.1.1
[Reproduction Path] What operations were performed to cause the issue
Scaled down a KV node
[Encountered Issue: Problem Phenomenon and Impact]
Logs from other KV nodes kept printing “invalid store xxxx,” where xxxx is the ID of the scaled-down KV node, for more than 7 hours.
SQL latency increased, and “tikv server busy” was shown in backoff.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
We restarted the PD and TiDB nodes, but the error persisted. Finally, we restarted all TiKV nodes, and the issue was resolved.
Additional Information
Our cluster was upgraded from v4.0.14 through a rolling upgrade.
It shouldn’t be. How did you perform the scale-in operation, and what version of tiup are you using?
Do you have the cluster topology information? How many TiKV and TiDB instances are there? Could you please share it for us to take a look?
Scaling down is the store delete command. I also think it shouldn’t be.
Scaling down is not store delete, that’s manual deletion.
Scaling down is tiup cluster scale-in.
There’s no difference; in the end, it’s all about executing store delete store-id.
After the store is deleted, is the corresponding TiKV node immediately shut down?
We have already performed all the operations you mentioned. Executing store 63248086
in PD directly reports an error saying that this store cannot be found.
We normally wait until the node status changes to tombstone, then remove the tombstone before stopping the node.
It is estimated that there is a problem with some cache. I have encountered TiDB accessing offline KV nodes before, and it was also resolved by restarting in the end.
Guess why the official documentation recommends using tiup cluster scale-in
for scaling down tikv
instead of store delete
.
Store delete is quite aggressive; it’s better to use the official documentation’s method for normal scaling.
The image you provided is not accessible. Please provide the text content you need translated.
Come and see what the official tiup is calling.
The image you provided is not accessible. Please provide the text you need translated.
Even if you delete the store, there will still be residual information in tiup. It’s better to use the officially recommended method.