Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiKV组件OOM后,执行SHOW STATS_HEALTHY;查询不到任何数据
The issue encountered: After the TiKV component experiences an OOM (Out of Memory) event, executing SHOW STATS_HEALTHY; does not return any data.
Master|root@(none)>show stats_healthy where table_name = ‘table_name’;
Empty set (0.00 sec)
It requires restarting the TiKV component to recover. Why is this happening?
Based on your description, there are two issues:
- TiKV OOM problem:
- Please check the dashboard statement analysis and slow query situation to find large SQLs.
- Confirm the block cache configuration size of TiKV to see if it is too large. If it is a scenario where multiple nodes are deployed on a single machine, be more careful when configuring this parameter.
- Unable to view the statistics health issue (STATS_HEALTHY):
Can this issue be reproduced?
During the troubleshooting period, check for any abnormal cluster logs, including tidb and tikv logs, and confirm them.
Check the cluster status with “display” to see if it is normal.
I suspect that after the TiKV component OOMs, the metadata information in memory is cleared, so it cannot be found. It will be reloaded after a restart.
Try again after some time, it might be loading metadata into memory.
He probably estimated that a long time had passed.
The loading time is a bit long, check the CPU and memory changes and health status of TiKV.
Please send the cluster topology diagram. Additionally, check the TiKV configuration with the command SHOW config WHERE NAME LIKE '%storage.block-cache.capacity%'
. If TiKV is running as a single instance on the machine, set storage.block-cache.capacity
to 45% of the total memory. If there are multiple instances, divide that by the number of instances.