Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb 持续oom
[Overview] Scenario + Problem Summary
Total number of TiDB nodes and whether PD is mixed
TiDB continuously OOMs
[Background] Operations performed
Modified the compression format for ticdc pushing to Kafka in the afternoon, added --sink-uri with compressionType=LZ4
[Problem] Current issue encountered
TiDB continuously OOMs
[Business Impact]
Affects the production system
[TiDB Version]
V5.4.0
Please provide the environment information.
Also, which TiDB nodes are experiencing OOM, or is it always the same one?
You can increase the value of the tidb_mem_quota_query parameter.
This is a development issue, a SQL issue.
All three TiDB instances encountered OOM (Out of Memory).
How many TiDB instances are there on each of the 3 TiDB nodes?
Are there any other components on these 3 nodes? If there are other components, how many instances are there on each node?
What is the memory on each node?
What are the memory-related parameter configurations for each instance on each node?
There are no other components. 16C32G, 3 nodes, only PD & TiDB services.
You can check what operation caused the OOM before it happened:
- A large number of transactions
- Large slow SQL
- Insufficient memory configuration?
If it is inconvenient to check, it is recommended to enable resource tracing to help monitor what operation caused the OOM.
One PD and one TiDB on one server?
How much memory-related parameters are configured for the instances of PD and TiDB?
TiDB & PD 16 cores 32GB
TiKV 16 cores 64GB
The resources are not too large. Are there any slow SQLs or SQLs that consume a lot of resources? Are there any operations involving large transactions?
Filtered out some expensive SQL statements.
What is the memory allocation ratio for each instance of each component in TiDB, PD, and TiKV?
mem-quota-query: Limits the usage of a single SQL query (default value 1GB). How much is this configuration?
Have you enabled the OOM temporary disk? If not, try enabling it first.