Memory Usage of TiDB Pod Continues to Surge

translator_bot · June 23, 2024, 1:05am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB pod的内存持续陡增

| username: 天锁斩月

【TiDB Usage Environment】Production Environment
【TiDB Version】v6.1.1
【Encountered Problem】After upgrading the cluster to 6.1.1, the memory usage of the tidb pod continues to increase

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

The log of the tidb pod:
tidb-cluster-tidb-0_tidb.log (415.1 KB)

translator_bot · June 23, 2024, 1:05am

| username: h5n1 | Original post link

Use curl -G tidb_ip:port/debug/pprof/heap > heap.profile to capture the memory and check the change in the number of goroutines in TiDB monitoring before and after the upgrade.

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

I couldn’t find the goroutine metrics in the TiDB monitoring.

translator_bot · June 23, 2024, 1:05am

| username: h5n1 | Original post link

tidb → server → goroutine count

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

There is no such metric.

translator_bot · June 23, 2024, 1:05am

| username: h5n1 | Original post link

Please redo the profile, I’m getting an error when I open it.

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

heap3.profile (247.1 KB)

translator_bot · June 23, 2024, 1:05am

| username: h5n1 | Original post link

The high memory usage observed in the heap is related to the execution plan, but the amount shown in the profile doesn’t match the actual usage of over 1GB.

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

When exporting this file, the pod had already restarted, and it was using around 700M of memory at that time.

translator_bot · June 23, 2024, 1:05am

| username: h5n1 | Original post link

First, analyze and check the slow SQL, and see if there are any SQL statements with multiple execution plans. Try turning off the parameter tidb_enable_prepared_plan_cache.

translator_bot · June 23, 2024, 1:05am

| username: 近墨者zyl | Original post link

Sort the full SQL in the dashboard by memory usage.

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

After I upgraded the monitoring, this metric appeared.

translator_bot · June 23, 2024, 1:05am

| username: h5n1 | Original post link

It’s not possible to compare this with the pre-upgrade state. Additionally, as mentioned earlier, try optimizing the slow SQL first. Try disabling the Plan cache to see if memory usage can be reduced, and check if it stops growing after reaching a certain size. It’s possible that some features and functionalities in the new version require more base memory than the previous version.

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

These are the top few after sorting in descending order.

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

The plan cache has been disabled, and the growth is relatively slow compared to before. The part of the graph where the memory increases slowly is the effect after disabling it.

translator_bot · June 23, 2024, 1:05am

| username: h5n1 | Original post link

Can these SQL statements be further optimized? They seem to be the same to me.

translator_bot · June 23, 2024, 1:05am

| username: 近墨者zyl | Original post link

The execution plan and details of the SQL, what is the concurrency of this SQL?

translator_bot · June 23, 2024, 1:05am

| username: 天锁斩月 | Original post link

The concurrency is not high, sometimes the query is executed every few minutes, sometimes once an hour.

translator_bot · June 23, 2024, 1:05am

| username: 裤衩儿飞上天 | Original post link

From a business perspective, is it necessary to use a left join?