A Series of Issues Caused by Some TiKV Nodes Failing to Connect to the Correct PD Nodes

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv部分节点无法连接正确的pd节点导致的一系列问题

| username: songxuecheng

[TiDB Usage Environment] Production Environment / Testing / PCO
Testing Environment
[TiDB Version] 6.1.0 upgraded to 6.1.1
[Encountered Issues]

  1. First, tested the bug Master: two tikv don't report region heartbeat after inject fault to pd leader · Issue #12934 · tikv/tikv · GitHub, manually switched the PD leader several times, and restarted the PD leader.

The issue was not reproduced during the testing process.

  1. Then performed the upgrade from 6.1.0 to 6.1.1



Upgrade failed

Attempted to restart TiKV.


Found disk IO waiting at 100%

Finally, the bug

[Reproduction Path] Operations performed that led to the issue
[Issue Phenomenon and Impact]

[Attachments]

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: qizheng | Original post link

  1. This bug only affects versions v5.3.2 and v5.4.2.

  2. When upgrading, is the io util 100% occupied by tikv-server? Have you checked iotop, system messages, and dmesg for errors?

| username: songxuecheng | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.