Intermittent OOM in TIDB-SERVER

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB-SERVER间歇性OOM

| username: TiDBer_pkQ5q1l0

【 TiDB Usage Environment 】Production Environment
【 TiDB Version 】v5.2.1
【 Encountered Issue 】TiDB single node memory keeps increasing, eventually OOM
【 Reproduction Path 】None
【 Issue Phenomenon and Impact 】heap_20220714 (1.2 MB)
TiDB single node memory keeps increasing, periodically OOM

| username: TiDBer_pkQ5q1l0 | Original post link

The image you provided is not visible. Please provide the text you need translated.

| username: TiDBer_wTKU9jv6 | Original post link

Are there any mixed deployment components to check for slow queries? Also, what are the settings for tidb_mem_oom_action and tidb_mem_quota_query?

| username: TiDBer_pkQ5q1l0 | Original post link

Without mixed components, tidb_mem_oom_action and tidb_mem_quota_query are the default official configurations. There are slow queries, but they are rare.

| username: TiDBer_pkQ5q1l0 | Original post link

The default value of tidb_dml_batch_size is 2000. If the batch size of a single DML statement exceeds this value, TiDB will split it into multiple smaller batches for execution.

| username: 啦啦啦啦啦 | Original post link

Check the TiDB logs to see if there are any expensive queries.

| username: TiDBer_Lee | Original post link

Check the logs to see which SQL statements were running when the OOM occurred. TiDB records all of this information. Look at the memory usage of each SQL statement at that time.

| username: TiDBer_pkQ5q1l0 | Original post link

The image you uploaded is not visible. Please provide the text you need translated.

| username: TiDBer_pkQ5q1l0 | Original post link

I have reviewed the SQL statements with OOM risk, and they are all simple SQLs with low memory usage.

| username: TiDBer_pkQ5q1l0 | Original post link

I think you can use the SHOW TABLE REGIONS command to see the distribution of table data across different regions.

| username: songxuecheng | Original post link

Find the slow query. Please share the execution plan.

| username: TiDBer_pkQ5q1l0 | Original post link

The slow query file is over 10 MB, making it difficult to upload. What does this image mean?

| username: xiaohetao | Original post link

Check if the above situation exists.

| username: xiaohetao | Original post link

Troubleshooting reference: TiDB 集群问题导图 | PingCAP 文档中心

| username: songxuecheng | Original post link

  1. How much memory does TiDB have?
  2. Find the execution plan for the expensive query and send it.
  3. Check the slow log situation of the problematic TiDB, whether the file is too large and the disk situation, and send a screenshot.
| username: TiDBer_pkQ5q1l0 | Original post link

[2022/07/14 10:47:21.038 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=66.726202067s] [cop_time=284.08799785s] [process_time=224.204s] [wait_time=5.155s] [request_count=97] [total_keys=1352168] [process_keys=1218325] [num_cop_tasks=97] [process_avg_time=2.311381443s] [process_p90_time=6.198s] [process_max_time=7.207s] [process_max_addr=10.70.77.30:20160] [wait_avg_time=0.053144329s] [wait_p90_time=0.18s] [wait_max_time=0.413s] [wait_max_addr=10.70.77.30:20160] [stats=tbl_sim_info:434573453576634369] [conn_id=305183] [user=iovcmiot_sim_prod_user] [database=iovcmiot_sim_prod] [table_ids=“[1365]”] [index_names=“[tbl_sim_info:idx_tbl_sim_info_cust_serial]”] [txn_start_ts=434573613274759169] [mem_max=“18786575 Bytes (17.9 MB)”] [sql="select count(*) from tbl_sim_info where cust_serial=‘20210930777010’ and is_delete = 0 "]
image

| username: songxuecheng | Original post link

Is the slow_log written very frequently and what is the slow-threshold setting? Are there many slow SQL queries in the cluster?

| username: TiDBer_pkQ5q1l0 | Original post link

The slow-threshold value is 300ms. There were some slow SQL queries at that time, but the slow_log file was not large. I don’t know why it would take up so much memory.

| username: songxuecheng | Original post link

  1. How much memory do you have? Expand the memory.
  2. Find the slow SQL queries at that time and handle them to be below 300ms.
  3. Adjust the slow-threshold from 300ms to a higher value.
| username: TiDBer_pkQ5q1l0 | Original post link

32GB memory