Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiDB 内存占用
[5.2.3]
Question 1:
The system is primarily insert-write, with few queries. One TiDB server has higher memory usage compared to others, and the heap is also larger. There are 3 DM synchronization tasks using this TiDB server.
- What information is stored in the heap, or why is it so large as listed in the pprofile?
- Is there any way to manually release memory (without restarting)?
- What are the reasons affecting memory release?
Question 2:
Additionally, TiDB memory release is quite busy. For example, releasing 99G of memory takes half an hour. Is there a way to achieve faster release?
Question 3:
Moreover, there is a strange phenomenon on this database where a certain TiKV node has a large number of coprocessor requests, but there are not many actual query requests.
This is another issue, not the pprof result above. I will add some description.
- Higher versions already use more aggressive parameters for Go GC.
- Can’t think of any quick methods to release memory for now.
- We still need to address it from the database usage perspective, such as setting parameters like oom-action and max-execute-time. If memory usage gets too high, it will kill the process, but we still need to understand the cause of the memory increase.
BTW: Actually, I still don’t understand the problem to be solved. If the profile matches the actual memory consumption, from the profile graph, should we turn off the slow_query feature?
The issue with pprofile is that this library is basically just performing synchronous insert operations, and it’s unclear what exactly the top 3 are doing.
This database has very few queries, but you can see that there are always thousands of coprocessor requests on one TiKV in question three. I don’t know where these requests are coming from.
It looks like the slow query log is causing the issue. Have you refreshed the slow log page on the dashboard multiple times? This can lead to a bug where the slow log is not released. This was fixed in version 5.3.2.