Memory Usage of TiDB Pod Continues to Surge

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB pod的内存持续陡增

| username: 天锁斩月

【TiDB Usage Environment】Production Environment
【TiDB Version】v6.1.1
【Encountered Problem】After upgrading the cluster to 6.1.1, the memory usage of the tidb pod continues to increase

| username: 天锁斩月 | Original post link

The log of the tidb pod:
tidb-cluster-tidb-0_tidb.log (415.1 KB)

| username: h5n1 | Original post link

Use curl -G tidb_ip:port/debug/pprof/heap > heap.profile to capture the memory and check the change in the number of goroutines in TiDB monitoring before and after the upgrade.

| username: 天锁斩月 | Original post link

I couldn’t find the goroutine metrics in the TiDB monitoring.

| username: h5n1 | Original post link

tidb → server → goroutine count

| username: 天锁斩月 | Original post link

There is no such metric.

| username: h5n1 | Original post link

Please redo the profile, I’m getting an error when I open it.

| username: 天锁斩月 | Original post link

heap3.profile (247.1 KB)

| username: h5n1 | Original post link

The high memory usage observed in the heap is related to the execution plan, but the amount shown in the profile doesn’t match the actual usage of over 1GB.

| username: 天锁斩月 | Original post link

When exporting this file, the pod had already restarted, and it was using around 700M of memory at that time.

| username: h5n1 | Original post link

First, analyze and check the slow SQL, and see if there are any SQL statements with multiple execution plans. Try turning off the parameter tidb_enable_prepared_plan_cache.

| username: 近墨者zyl | Original post link

Sort the full SQL in the dashboard by memory usage.

| username: 天锁斩月 | Original post link

After I upgraded the monitoring, this metric appeared.

| username: h5n1 | Original post link

It’s not possible to compare this with the pre-upgrade state. Additionally, as mentioned earlier, try optimizing the slow SQL first. Try disabling the Plan cache to see if memory usage can be reduced, and check if it stops growing after reaching a certain size. It’s possible that some features and functionalities in the new version require more base memory than the previous version.

| username: 天锁斩月 | Original post link

These are the top few after sorting in descending order.

| username: 天锁斩月 | Original post link

The plan cache has been disabled, and the growth is relatively slow compared to before. The part of the graph where the memory increases slowly is the effect after disabling it.

| username: h5n1 | Original post link

Can these SQL statements be further optimized? They seem to be the same to me.

| username: 近墨者zyl | Original post link

The execution plan and details of the SQL, what is the concurrency of this SQL?

| username: 天锁斩月 | Original post link

The concurrency is not high, sometimes the query is executed every few minutes, sometimes once an hour.

| username: 裤衩儿飞上天 | Original post link

From a business perspective, is it necessary to use a left join?