TiDB Unexpected Restart Issue

translator_bot · June 23, 2024, 7:42am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tidb异常重启问题

| username: Inkjade

【TiDB Usage Environment】Production Environment
【TiDB Version】V5.4
【Encountered Problem】TiDB server abnormal restart
【Reproduction Path】Query transaction data through Superset
【Problem Phenomenon and Impact】
Found abnormal memory usage of TiDB server, the operating system directly killed the TiDB server

Checked abnormal restart through dmesg

Memory control

How to control the memory of the TiDB server instance? To avoid abnormal restart of the TiDB server affecting the business.

translator_bot · June 23, 2024, 7:42am

| username: songxuecheng | Original post link

Slow SQL
Expensive SQL, check the search logs

translator_bot · June 23, 2024, 7:43am

| username: Inkjade | Original post link

The SQL has been found. But is there any way to avoid TiDB server restarts, or directly return an error message for similar SQL?

translator_bot · June 23, 2024, 7:43am

| username: TiDBer_jYQINSnf | Original post link

It’s an OOM issue. Optimize the SQL, remove those full table scans, and if optimization is not possible, set limits:

With these three configurations set properly, the upper limit of TiDB memory is basically determined. If it still exceeds, it’s likely due to an increase in the number of connections. If the number of connections is also surging, limit the max-connection as well.

The correct approach is to solve it from the business side: control the number of connections and the execution plan of the SQL. If it’s difficult to solve from the business side, the combination of the above parameters will at least prevent TiDB from going OOM.

translator_bot · June 23, 2024, 7:43am

| username: Inkjade | Original post link

Currently, I control SQL query memory and execution duration through oom-action, mem-quota-query, and max_execution_time. If a large SQL execution exceeds the execution duration, the SQL execution is directly terminated to ensure the stability of the TiDB server.

translator_bot · June 23, 2024, 7:43am

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.