Issues with RAFT Group Leader

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: RAFT GROUP LEADER问题

| username: TiDBer_Terry261

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
Added two TiKV nodes and created an index on a large table.


From the screenshot, it is observed that the LEADER on some TiKVs periodically drops to 0 and then instantly recovers. Does anyone know what is happening?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: tidb菜鸟一只 | Original post link

Are those the nodes where multiple TiKV nodes are deployed on a single machine? Have those nodes implemented resource isolation?

| username: TiDBer_Terry261 | Original post link

Two machines with good performance were used to install two TiKV instances each. No isolation was implemented.

| username: tidb菜鸟一只 | Original post link

It is recommended to use NUMA for resource isolation. Also, are the data directories of these two nodes mounted under the same root directory?

| username: TiDBer_Terry261 | Original post link

It is necessary to do some isolation. I checked the LINUX system logs and found that the TIKV process was killed by the system, then TIKV immediately restarted itself, resulting in the graph shown in the monitoring.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.