Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: RAFT GROUP LEADER问题
[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
Added two TiKV nodes and created an index on a large table.
From the screenshot, it is observed that the LEADER on some TiKVs periodically drops to 0 and then instantly recovers. Does anyone know what is happening?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]
Are those the nodes where multiple TiKV nodes are deployed on a single machine? Have those nodes implemented resource isolation?
Two machines with good performance were used to install two TiKV instances each. No isolation was implemented.
It is recommended to use NUMA for resource isolation. Also, are the data directories of these two nodes mounted under the same root directory?
It is necessary to do some isolation. I checked the LINUX system logs and found that the TIKV process was killed by the system, then TIKV immediately restarted itself, resulting in the graph shown in the monitoring.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.