Abnormal Duration of Various Locks in TiKV, TiDB GC Unable to Execute Properly

translator_bot · June 22, 2024, 12:52pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIKV各种锁持续时间异常,TIDB GC无法正常执行

| username: TiDBer_27OdodiJ

[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.0
[Reproduction Path] None
[Encountered Issues: Problem Phenomenon and Impact]
Issues:

TiKV continuously receives related alerts, unable to eliminate [TiKV scheduler latch wait duration seconds more than 1s], [TiKV scheduler context total]
TIDB GC process cannot proceed normally
Client logs report Tikv server is busy
=========
Personal Investigation Results:
The current cluster had Juicefs metadata cluster added yesterday, with only Juicefs client. These issues appeared after some time of use.
I looked for official related alert handling but couldn’t find anything matching the issues I encountered.

[Resource Configuration]
3 physical machines, each with 2 NVME disks, each machine deployed with 2 TiKV instances
[Attachments: Screenshots/Logs/Monitoring]
===============================Related Monitoring Screenshots====================

Machine Performance Monitoring:
Overall machine resource usage is not high

image1380×550 101 KB
gRPC Related Monitoring:
Various locks last for a very long time

image1380×536 114 KB
CPU Monitoring of Each Component:
One of the TiKV scheduler CPUs is consistently higher than the other 5 TiKVs

image1380×606 133 KB
GC Related Panel:
GC savepoint stays at a very old point

image1380×664 88.1 KB