TiFlash OOM Frequent Restarts

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tiflash OOM 频繁重启

| username: Jennie

【TiDB Usage Environment】Production Environment
【TiDB Version】V6.2.0
【Encountered Problem】tiflash OOM frequently restarts
【Reproduction Path】alter table dataname.xxxx SET TIFLASH REPLICA 1; After adding the replica and completing synchronization, OOM starts to occur
【Problem Phenomenon and Impact】

【Attachments】Related logs and monitoring (https://metricstool.pingcap.com/)
Basic monitoring


Region

Server

Raft

tiflash related parameter configuration (default configuration)

| username: h5n1 | Original post link

Check if the continuous profiling in the dashboard diagnostics is enabled. Try turning it off.

| username: Jennie | Original post link

Indeed, it is on.

| username: Jennie | Original post link

It’s already turned off. I’ll observe for a while longer.

| username: Jennie | Original post link

The problem still exists after turning off continue profiling.

| username: h5n1 | Original post link

Please send the TiFlash logs.

| username: Jennie | Original post link

Could it be because this partitioned table is too large?

| username: Jennie | Original post link

tiflash_172.31.14.22_3930.log (72.1 MB)

| username: h5n1 | Original post link

Is there a query at the time of OOM?

| username: Jennie | Original post link

There was an OOM around 15:38 in the East 8th District.

| username: Jennie | Original post link

The time range of this log is more reasonable, including OOM.
tiflash_172.31.14.22_3930_1.log (9.9 MB)

| username: h5n1 | Original post link

Looking at the logs, there was nothing unusual before the OOM, but the monitoring shows a sudden increase in memory usage. It is confirmed that there was a large table query at that time as you mentioned.

| username: Jennie | Original post link

No suspicious SQL found.

| username: flow-PingCAP | Original post link

From the known bugs, if OOM does not occur after a restart, it is generally caused by a query.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.