TiDB Memory Usage

translator_bot · June 23, 2024, 3:41am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 内存占用

| username: h5n1

[5.2.3]
Question 1:
The system is primarily insert-write, with few queries. One TiDB server has higher memory usage compared to others, and the heap is also larger. There are 3 DM synchronization tasks using this TiDB server.

What information is stored in the heap, or why is it so large as listed in the pprofile?
Is there any way to manually release memory (without restarting)?
What are the reasons affecting memory release?

Question 2:
Additionally, TiDB memory release is quite busy. For example, releasing 99G of memory takes half an hour. Is there a way to achieve faster release?

Question 3:
Moreover, there is a strange phenomenon on this database where a certain TiKV node has a large number of coprocessor requests, but there are not many actual query requests.

translator_bot · June 23, 2024, 3:41am

| username: jansu-dev | Original post link

The 99GB memory consumption does not match the memory consumption of the functions in the profile. You can capture the profile when memory consumption is relatively high.
Heap memory indicates the number of bytes of heap memory currently in use, including the part not yet garbage collected by Go, but the difference shouldn’t be this large. It is likely that some specific functionality is causing the high memory consumption.

translator_bot · June 23, 2024, 3:41am

| username: h5n1 | Original post link

This is another issue, not the pprof result above. I will add some description.

translator_bot · June 23, 2024, 3:41am

| username: jansu-dev | Original post link

Higher versions already use more aggressive parameters for Go GC.
Can’t think of any quick methods to release memory for now.
We still need to address it from the database usage perspective, such as setting parameters like oom-action and max-execute-time. If memory usage gets too high, it will kill the process, but we still need to understand the cause of the memory increase.

BTW: Actually, I still don’t understand the problem to be solved. If the profile matches the actual memory consumption, from the profile graph, should we turn off the slow_query feature?

translator_bot · June 23, 2024, 3:41am

| username: h5n1 | Original post link

The issue with pprofile is that this library is basically just performing synchronous insert operations, and it’s unclear what exactly the top 3 are doing.

translator_bot · June 23, 2024, 3:42am

| username: jansu-dev | Original post link

The first two are parsing slow queries;
handleCopResponse is processing the data returned from TiKV (there should be some queries, internal or external);

translator_bot · June 23, 2024, 3:42am

| username: h5n1 | Original post link

This database has very few queries, but you can see that there are always thousands of coprocessor requests on one TiKV in question three. I don’t know where these requests are coming from.

translator_bot · June 23, 2024, 3:42am

| username: jansu-dev | Original post link

Could you check what the tikv.log at 118:20182 is doing, specifically regarding coprocessor-related content?
Is it convenient to collect a clinic report?

translator_bot · June 23, 2024, 3:42am

| username: 人如其名 | Original post link

It looks like the slow query log is causing the issue. Have you refreshed the slow log page on the dashboard multiple times? This can lead to a bug where the slow log is not released. This was fixed in version 5.3.2.