Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: tikv节点cpu使用率突然变高,sql耗时明显变长造成业务超时,同时qps下降此问题原因分析协助
[TiDB Usage Environment] Production Environment / Testing / Poc
Production Environment
[TiDB Version]
v5.0.6
[Encountered Problem]
2022/10/08 15:56:00 ~ 16:28:00, a large number of timeout errors reported by online applications
[Reproduction Path] Operations performed that caused the problem
Not reproduced, self-recovered
[Problem Phenomenon and Impact]
During this time period, checking the TiDB dashboard, all SQL execution times were several times slower than usual, not just individual SQLs, and QPS dropped significantly. Further checking Grafana, the CPU usage of all TiKV nodes was close to 80%, with some exceeding 80% and triggering alerts.
[Attachment]
On Alibaba Cloud, the TiKV nodes are 8c64g local disks, deployment information as follows:
All parameter configurations use default values without adjustments.
Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.