Consultation on TiDB Cluster Power Outage Issue

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB集群断电问题咨询

| username: jaybing926

【TiDB Usage Environment】Production Environment
【TiDB Version】


【Encountered Problem】
Hello teachers, recently a switch in our server room had an issue, and now we need to replace the switch (the switch where the TiDB cluster is located). During the switch replacement, there will be a network outage, and the entire TiDB cluster will be disconnected (for about 10-20 minutes). I would like to ask what impact this operation will have on TiDB? What should we pay attention to? Do we need to make any pre-configurations (like timeout settings, liveness detection time, etc.) for the cluster? Thank you~~

| username: Meditator | Original post link

In this scenario, it’s a matter of shutting down for maintenance. Just stop the entire cluster first.

| username: xfworld | Original post link

Consider getting a UPS for the switch… :joy:

If it exceeds half an hour, the Raft version basically can’t catch up and will evolve into catching up with the snapshot version. After reaching a certain level of alignment, it will catch up to the Raft log.

During this period, the cluster may be inaccessible or extremely slow to access…

| username: jaybing926 | Original post link

Isn’t UPS a backup power supply? This is not a question, we are replacing the switch.

In our situation, wouldn’t we need a hot standby switch to avoid affecting the business?

| username: xfworld | Original post link

Yes, a backup power supply, a small one, should be enough to last for a few hours.

| username: jaybing926 | Original post link

Shut down the cluster: tiup cluster stop ${cluster-name}
Start the cluster after replacement: tiup cluster start ${cluster-name}
Teacher, is there anything wrong with the above operations?

| username: Kongdom | Original post link

Yes, it’s these two commands

| username: jaybing926 | Original post link

Okay, thanks~

| username: tidb狂热爱好者 | Original post link

Shut down the entire cluster, that’s the safest way.
I haven’t encountered this kind of problem before. I have encountered situations where TiKV was taken offline and deleted.

| username: cs58_dba | Original post link

It’s better to make a backup before shutting down, just in case.

| username: cs58_dba | Original post link

Physical backups are fast, while logical backups are better for restoring individual tables.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.