TiDB Continuous OOM

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 持续oom

| username: leoones

[Overview] Scenario + Problem Summary
Total number of TiDB nodes and whether PD is mixed
TiDB continuously OOMs



[Background] Operations performed
Modified the compression format for ticdc pushing to Kafka in the afternoon, added --sink-uri with compressionType=LZ4

[Problem] Current issue encountered
TiDB continuously OOMs

[Business Impact]
Affects the production system

[TiDB Version]
V5.4.0

| username: xfworld | Original post link

Please provide the environment information.

Also, which TiDB nodes are experiencing OOM, or is it always the same one?

| username: wuxiangdong | Original post link

You can increase the value of the tidb_mem_quota_query parameter.

| username: tidb狂热爱好者 | Original post link

This is a development issue, a SQL issue.

| username: leoones | Original post link

All three TiDB instances encountered OOM (Out of Memory).

| username: xiaohetao | Original post link

How many TiDB instances are there on each of the 3 TiDB nodes?

| username: xiaohetao | Original post link

Are there any other components on these 3 nodes? If there are other components, how many instances are there on each node?

| username: xiaohetao | Original post link

What is the memory on each node?
What are the memory-related parameter configurations for each instance on each node?

| username: leoones | Original post link

There are no other components. 16C32G, 3 nodes, only PD & TiDB services.

| username: xfworld | Original post link

You can check what operation caused the OOM before it happened:

  1. A large number of transactions
  2. Large slow SQL
  3. Insufficient memory configuration?

If it is inconvenient to check, it is recommended to enable resource tracing to help monitor what operation caused the OOM.

| username: xiaohetao | Original post link

One PD and one TiDB on one server?

| username: xiaohetao | Original post link

How much memory-related parameters are configured for the instances of PD and TiDB?

| username: xiaohetao | Original post link

:+1::+1::+1:

| username: leoones | Original post link

TiDB & PD 16 cores 32GB
TiKV 16 cores 64GB

| username: forever | Original post link

The resources are not too large. Are there any slow SQLs or SQLs that consume a lot of resources? Are there any operations involving large transactions?

| username: leoones | Original post link

Filtered out some expensive SQL statements.

| username: xiaohetao | Original post link

What is the memory allocation ratio for each instance of each component in TiDB, PD, and TiKV?

| username: xiaohetao | Original post link

mem-quota-query: Limits the usage of a single SQL query (default value 1GB). How much is this configuration?

| username: xiaohetao | Original post link

Have you enabled the OOM temporary disk? If not, try enabling it first.

| username: leoones | Original post link

Default values