Deployment and Verification Issues of Adaptive Synchronous Mode in Two Centers in the Same City - Data Cannot Be Written When One Center Fails

translator_bot · June 23, 2024, 12:45am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 同城两中心自适应同步模式部署以及验证问题-一个中心挂掉，不能写入数据的问题

| username: TiDBer_ZqP4AAwf

Adaptive synchronization mode deployment for two data centers in the same city: 单区域双 AZ 部署 TiDB | PingCAP 文档中心
【TiDB Environment】Testing Environment
【TiDB Version】TiDB-v6
【Encountered Issue】
【Reproduction Path】What operations were performed to encounter the issue
Result after deploying and configuring synchronization according to the installation document:
{
“mode”: “dr-auto-sync”,
“dr-auto-sync”: {
“label_key”: “zone”,
“state”: “sync_recover”,
“state_id”: 2009,
“total_regions”: 3
}
}
The state in the document is sync, but the actual deployment result is sync_recover.
The total_regions changed from 1 to 3 after starting and stopping the service.

【Issue Phenomenon and Impact】
Verification operation process:
After the cluster setup and configuration are completed,
manually use tiup cluster stop -R tikv -N node1,node2 to shut down the two tikv nodes of idc2, then log in to tidb to create a database table and write data.
Then start the previously shut down nodes and wait for data synchronization.

Next, manually shut down the tikv services of node1 and node2 in idc1, log in to tidb to perform write operations on the data table, but the result is that data cannot be queried or written to the old table, and new tables cannot be created.

My understanding is that in a dual-center synchronization mode, if one center goes down, the other center should be able to provide services normally, and when the downed center recovers, it should automatically synchronize data to ensure that data shards are distributed across both data centers.

However, during deployment, if one data center, especially the primary idc’s tikv service, goes down, it causes the entire tidb to be unavailable.

Please help to answer the encountered issue, thank you very much.

translator_bot · June 23, 2024, 12:45am

| username: jansu-dev | Original post link

However, according to the description, after shutting down the node, the commit group degrades to the raft group majority mechanism. Can it still satisfy the majority?

Where are the deployment labels and learners placed? Are they completely consistent with the operation manual?

translator_bot · June 23, 2024, 12:45am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.