I have a question. Should there be at least 5 PD nodes in the diagram, 3 + 2? Because if the Primary IDC goes down completely, the majority of PDs will shift to the DR DC. However, the DR DC only has one PD, and one PD cannot cast a majority vote, making the entire cluster unavailable?
I conducted a test, disregarding AZ availability, and in a single data center, I used a firewall to isolate two PD nodes to simulate a crash scenario. The TiKV nodes were all in a down state.
Regardless of whether it’s 1, 3, 5, or 7 nodes, as long as the primary center goes down, according to Raft, the backup center does not have the ability to elect a leader. In this case, can TiDB force the backup center to become the leader by dropping PD, so that it can continue to provide services?
What you said is correct. According to the configuration, if the primary center goes down, the backup center will be unavailable.
The principle for setting PD nodes is the same as for TiKV nodes, with a primary-to-backup center ratio of 3:2 for 5 nodes. The roles of the nodes should be at least 2 voters and 1 follower in the primary center, and at least 2 voters in the backup center. This design ensures availability.
If the primary AZ fails and most of the Voter replicas are lost, but the secondary AZ has complete data, you can recover the data from the secondary AZ. This requires manual intervention and the use of professional tools for recovery. For support, please contact PingCAP Service and Support.
In practice, it is possible, but it requires manual intervention. If a single region with dual AZs has one AZ down, whether it’s PD or TiKV, manual intervention is needed…