TiDB server memory continues to increase, eventually leading to connection failure

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB 服务器内存持续增涨,最终导致无法连接。

| username: TiDBer_ydSkDlLw

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】
【Reproduction Path】Using .NET to write insert statements, inserting 1000 rows into 30 tables each time, running continuously. Check the server memory, it keeps increasing, and when it reaches the peak value, the database connection fails. The cluster memory usage has been set to 80%, but the same issue occurs. Has anyone encountered a similar phenomenon? How to adjust the configuration? The screenshot below is taken after restarting Ubuntu. When QPS is 0, the database connection fails, and it can automatically recover without any operation, but it takes too long.
【Encountered Problem: Problem Phenomenon and Impact】
【Resource Configuration】Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】

WeChat Screenshot_20240604095133

| username: h5n1 | Original post link

OOM occurred, the memory is too small, running so many components on 2 nodes.

| username: TiDBer_ydSkDlLw | Original post link

This amount of data isn’t that large, is it? Is this considered small? This is a virtual machine, so I’ll adjust it to 32G and observe further.

| username: h5n1 | Original post link

32GB is not that large. If you are just testing, you can remove 3 TiKV nodes without considering performance and load balancing, and remove one TiDB node. You also need to set the capacity of TiKV to control memory usage.

| username: 啦啦啦啦啦 | Original post link

There are too many components in the mixed deployment. Memory parameters need to be adjusted for hybrid deployment. Have these been adjusted?

| username: TiDBer_ydSkDlLw | Original post link

I feel that the data volume is not large, and everything is following the default values. Let’s adjust it first and see. If it doesn’t work, we can readjust it.

Multiple servers are considered mixed deployment.

With the current configuration, continuous insertion can’t even last for 4 hours before the memory is full. It’s a bit unacceptable.

| username: TiDBer_ydSkDlLw | Original post link

If we go by this, configuring too many servers would be too costly if deployed on a per-project basis.

If the project has a total of 4 servers and they are used as databases, at most only 2 can be used. How should these configurations be controlled to achieve optimal performance and stability? The CPU and memory can reach peak values, and it’s acceptable for insert and query operations to be slow, but it’s unacceptable for database connections to fail.

| username: 啦啦啦啦啦 | Original post link

If the parameters are well-tuned, it can be used, but performance and high availability cannot be guaranteed. If the data volume is really not large, why not use MySQL?

| username: TiDBer_ydSkDlLw | Original post link

Device operation data: a device has hundreds of data points, and the data is saved every 5 seconds. The planned number of devices is 500~2000. The software needs to run continuously, and the amount of historical data is quite considerable. Using MySQL might not be able to handle it.

| username: h5n1 | Original post link

Even with mixed deployment of components, at least 3 virtual machines are needed, otherwise high availability cannot be guaranteed.

| username: 小龙虾爱大龙虾 | Original post link

For TiDB hybrid deployment, at least 3 machines are required. Hybrid deployment needs certain parameter configurations; using the default settings will definitely cause crashes because the default settings are designed to consume the entire memory.


| username: zhaokede | Original post link

Is the connection and memory release of the application not managed?

| username: 迪迦奥特曼 | Original post link

If mixed deployment is used, it is necessary to adjust the memory usage limit parameters for each component.



| username: Billmay表妹 | Original post link

If it’s a production environment, two suggestions:

  1. It is recommended to deploy according to the official requirements:

    This can reduce unnecessary troubles: TiDB 软件和硬件环境建议配置 | PingCAP 文档中心
    Do not mix deployments!
  2. It is recommended to upgrade to version 8.1 LTS. Version 8.0 is a DMR version, and DMR versions are not suitable for use in production environments. TiDB 版本规则 | PingCAP 文档中心
| username: TIDB-Learner | Original post link

Hybrid deployment, limited memory.

| username: TiDBer_ydSkDlLw | Original post link

I checked the website, and it mentioned that the etc/config-template.toml file could not be found. The parameter description structure below also did not find the corresponding modification location.

It also provided the command tiup cluster edit-config ${cluster-name} for modification, but I did not see the relevant template here either. Where is the memory for KV set?

| username: lemonade010 | Original post link

Setting memory limits for mixed deployments doesn’t work, it only restricts TiDB.

| username: WalterWj | Original post link

TiDB defaults to using 80% of the server’s memory for each component. TiKV has caching. You need to configure the memory usage.

| username: TiDBer_ydSkDlLw | Original post link

The online configuration is as follows:
log.slow-threshold: 300 # Note that the key here may need to be modified according to the actual configuration of TiDB
readpool.storage.use-unified-pool: true
readpool.coprocessor.use-unified-pool: true
storage.block-cache.capacity: “2147483648”
replication.enable-placement-rules: true
replication.location-labels: [“host”]
logger.level: “info”

After setting storage.block-cache.capacity=2G, the maximum running memory of KV has now reached 4G. It has been inserted for two and a half hours, and it seems that there is still a growth trend.

| username: WalterWj | Original post link

storage.block-cache.capacity=2G * 2.2 is roughly the final memory usage of TiKV.