This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv-ctl

| username: Hacker_7p5BxP3A

[TiDB Usage Environment] Production Environment
[TiDB Version] V5.2.1
[Reproduction Path] The disk capacity of a certain TiKV node exceeded 90%. After performing two expansion operations (adding two new nodes), one of the newly added nodes was scaled down before the peers were fully balanced.
[Encountered Problem: Phenomenon and Impact]
The cluster is currently reading and writing normally, but the TiKV node that previously had over 90% disk capacity cannot be brought up.

From the images, it seems that the current node needs to execute a merge to generate a new peer, but the current node cannot be brought up.

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: 小龙虾爱大龙虾 | Original post link

An alert should be triggered at 80%, why wait until 90? Compaction might not even be possible by then. :joy_cat:

| username: 小龙虾爱大龙虾 | Original post link

Observe the monitoring to see if the disk usage on this high-usage node is gradually recovering, and if the new node is receiving data.

| username: Hacker_7p5BxP3A | Original post link

This node is down, but the entire cluster is functioning normally for read and write operations :sweat_smile:. I’m not sure if we can forcibly take this node down.

| username: 小龙虾爱大龙虾 | Original post link

Since it’s normal read and write, don’t think about using abnormal methods to handle it. You can try clearing the logs first to free up some space and see if the node can recover.

| username: Hacker_7p5BxP3A | Original post link

After pruning the Tombstone node (, the TiKV on the same host ( also went down.

| username: Hacker_7p5BxP3A | Original post link

After deleting the logs of Raft and RocksDB, it has now dropped below 80%, and it feels like it can’t start up on its own anymore.

| username: tidb狂热爱好者 | Original post link

For tea, it must be taken offline.

| username: Hacker_7p5BxP3A | Original post link

If there are no issues with taking it offline, we can proceed with scaling up again and reusing the resources.

| username: Hacker_7p5BxP3A | Original post link

:joy: I analyzed the data from the past few days and it seems there are no missing entries. I proceeded with the scale-down directly.

| username: dba远航 | Original post link

Try going offline and then online again.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.