Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb集群只有三台tikv,现在一台tikv机器处于不可控状态,重启这台机器有什么影响嘛
[TiDB Usage Environment] Production
[TiDB Version] v4.0.9
[Encountered Problem] The TiDB cluster has only three TiKV nodes, and one of the TiKV machines is currently uncontrollable.
[Reproduction Path] Tried to stop the TiKV service on the uncontrollable machine through the control machine, but found it couldn’t be controlled or stopped.
[Problem Phenomenon and Impact]
Not sure what impact a direct hard reboot would have.
If one of the TiKV nodes is uncontrollable, it is recommended to first add a new node and then decommission the uncontrollable one.
This can maintain service availability. If you directly restart the TiKV node, it won’t meet the condition of having 3 replicas, which will affect the ability to provide read and write services (the service may be unavailable for a short period).
After the restart is complete, if the TiKV node can recover normally, the service will automatically resume.
Using the command tiup cluster stop <cluster_name> -N *.*.*.*:***
on the control machine is not successful, and that machine cannot be logged into. Other machines also cannot access it remotely. Is there any other way to take it offline? We also have another rather silly issue, which is that no other machines have been added.
Backup methods are great.
Does the restart refer to restarting the cluster or just the uncontrollable TiKV machine?
If it is already uncontrollable and inaccessible, I feel that restarting this TiKV machine won’t have much impact.
After restarting the machine, one of the three TiKV nodes will go down. At this time, the TiKV cluster will be in a read-only state and cannot be written to. Once the restart is complete and the TiKV service starts normally, PD will perform scheduling. After the scheduling is completed, the cluster will return to normal.
Uncontrolled machines will enter an abnormal state in the PD instance if no heartbeat is detected, and no scheduling will be done for those uncontrolled machines.
The key is still to add a node.
Although I can’t control my machine, PD can detect it normally. It’s very strange. We just can’t log in.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.