Querying Table Error After Rebuilding PD Cluster in a 3-Node Mixed Deployment Cluster

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 3节点混布集群,重建PD集群后,查询表报错

| username: Kongdom

[TiDB Usage Environment] Production Environment
[TiDB Version] V5.1.0
[Encountered Issues]

After rebuilding the PD cluster according to the documentation, 80% of the table queries report error 1105, with only the 1105 error code and no detailed error information.
Executing the select count(*) statement works fine without errors.
Executing SHOW TABLE [table_name] REGIONS prompts “PD returned no region.”
jeT9CnYilb

After rebuilding one of the tables and importing data exceeding 200,000, querying again prompts error 1105, but SHOW TABLE works.

[Reproduction Path] Operations performed that led to the issue
[Issue Phenomenon and Impact]

| username: 张雨齐0720 | Original post link

The reason for no region occurring: When the cluster requests region information from PD, it does not get the corresponding response.

Refer to another post for a fix: 关于tinykv3B中no region问题的排查分析 - #2,来自 sunznx - TiDB 的问答社区

| username: Kongdom | Original post link

Tried it, no effect~

| username: xfworld | Original post link

Is there a backup? If so, it is recommended to reinstall and restore…

| username: 张雨齐0720 | Original post link

Are you using this split?

| username: Kongdom | Original post link

Without a backup, it’s possible to rebuild the table, but importing 200,000 records still results in error 1105.

| username: Kongdom | Original post link

Yes, that’s the one.

| username: h5n1 | Original post link

Normally, after PD is rebuilt, TiKV should report region information. Check the error messages in the logs to see if you can find the involved regions, then check the region status. It might be necessary to perform a multi-replica failure recovery.

| username: Kongdom | Original post link

The TiKV nodes haven’t been touched, so normally there shouldn’t be any issues with the KV.

| username: h5n1 | Original post link

According to your deployment, 210 is not being used, and there are also conflict issues with TiKV.

| username: Kongdom | Original post link

For a period of time, the virtual IP pointed to 210, and as a result, there were also a few leaders on 210. After evicting the leaders, there was still no effect.

| username: Kongdom | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.