Survey: How the Industry Manages TiDB OOM

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 调研:业界TiDB OOM如何管理的

| username: TiDBer_yyy

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.1.0
How do you manage OOM, and what preventive measures have been implemented in the production environment?

Our management plan:
Configuration Information

  • TiDB layer: limit a single SQL to 1/4 of total memory;
  • TiKV layer: limit to about 75% of total memory;
  • PD: no limit;
  • TiCDC: no limit;
  • TiFlash: limit a single SQL to 70%, total memory limit to 80%;

Operations Management
Configure a separate TiDB-server for statistical reports and offline tasks at the TiDB layer, so even if it crashes, it won’t interrupt online business;

Case 1:

Case 2:

Case 3:

| username: Billmay表妹 | Original post link

After 6.5.x and 7.x, the probability of OOM has significantly decreased.

| username: TiDBer_yyy | Original post link

Yes, after the upgrade, memory release is particularly proactive, and the probability has significantly decreased.

| username: Kongdom | Original post link

Split the business into point queries and point writes. Separate business and analysis into independent clusters, allowing the analysis cluster to experience OOM.

| username: Inkjade | Original post link

  1. It is recommended to upgrade the version to TiDB 6.5 or 7.x.
  2. Separate the real-time database from the business database. Necessary resource compression; the resource control feature in 7.5 has not been practiced yet.
  3. Monitor top SQL and optimize adjustments in a timely manner.
| username: 小龙虾爱大龙虾 | Original post link

I feel it’s okay. The default OOM parameter is 1GB, so the instance is not that easy to OOM.

| username: Jellybean | Original post link

Before going live with the business, ensure proper design and validation. During usage, pay attention to execution plan monitoring, which generally helps control memory usage effectively.

| username: forever | Original post link

Don’t offload all computations to the database; it’s appropriate to let the application handle some of it. Stateless applications can scale resources at any time.

| username: 随缘天空 | Original post link

Implement resource limitations, restrict SQL as much as possible, and try to release resources after use.

| username: zhanggame1 | Original post link

Evaluate the data volume and the memory usage of an SQL execution, and identify those with high memory usage to see if they can be optimized or split.

| username: 哈喽沃德 | Original post link

Adjust TiDB’s configuration parameters to suit your application load. For example, tidb_mem_quota_query and tidb_mem_quota_hashjoin control the memory quota TiDB uses for queries and hash joins, and can be adjusted accordingly to reduce memory usage. Upgrading to the latest version can also improve this issue, and application optimization is another aspect to consider.

| username: kelvin | Original post link

Try to limit the SQL as much as possible and optimize it well.

| username: changpeng75 | Original post link

Provide proper training for developers and offer optimized development suggestions based on the specific environment and business logic.

| username: zhang_2023 | Original post link

In the end, it’s still about the optimization of SQL itself.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.