What impact will there be if I restart a TiKV machine when the TiDB cluster has only three TiKV nodes and one of them is in an uncontrollable state?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb集群只有三台tikv,现在一台tikv机器处于不可控状态,重启这台机器有什么影响嘛

| username: ablewang_xiaobo

[TiDB Usage Environment] Production
[TiDB Version] v4.0.9
[Encountered Problem] The TiDB cluster has only three TiKV nodes, and one of the TiKV machines is currently uncontrollable.
[Reproduction Path] Tried to stop the TiKV service on the uncontrollable machine through the control machine, but found it couldn’t be controlled or stopped.
[Problem Phenomenon and Impact]
Not sure what impact a direct hard reboot would have.

| username: xfworld | Original post link

If one of the TiKV nodes is uncontrollable, it is recommended to first add a new node and then decommission the uncontrollable one.

This can maintain service availability. If you directly restart the TiKV node, it won’t meet the condition of having 3 replicas, which will affect the ability to provide read and write services (the service may be unavailable for a short period).

After the restart is complete, if the TiKV node can recover normally, the service will automatically resume.

| username: ablewang_xiaobo | Original post link

Using the command tiup cluster stop <cluster_name> -N *.*.*.*:*** on the control machine is not successful, and that machine cannot be logged into. Other machines also cannot access it remotely. Is there any other way to take it offline? We also have another rather silly issue, which is that no other machines have been added. :dotted_line_face:

| username: 裤衩儿飞上天 | Original post link

Backup methods are great.

| username: Kongdom | Original post link

Does the restart refer to restarting the cluster or just the uncontrollable TiKV machine?
If it is already uncontrollable and inaccessible, I feel that restarting this TiKV machine won’t have much impact.

| username: Lystorm | Original post link

After restarting the machine, one of the three TiKV nodes will go down. At this time, the TiKV cluster will be in a read-only state and cannot be written to. Once the restart is complete and the TiKV service starts normally, PD will perform scheduling. After the scheduling is completed, the cluster will return to normal.

| username: xfworld | Original post link

Uncontrolled machines will enter an abnormal state in the PD instance if no heartbeat is detected, and no scheduling will be done for those uncontrolled machines.

The key is still to add a node.

| username: ablewang_xiaobo | Original post link

Although I can’t control my machine, PD can detect it normally. It’s very strange. We just can’t log in.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.