[TiDB Version] v5.4.0
When restoring from a TiDB backup to a newly created TiDB instance, after TiKV has been restored (TiFlash has not yet fully synchronized and is still syncing data), the business starts to perform write operations (by setting the engine to disable TiFlash query operations), and TiFlash continues to sync. When TiFlash has synced about 1.4T (data not fully synced), two TiFlash nodes frequently experience OOM restarts, occurring approximately every 4 minutes. Setting the following parameters does not work and OOM continues:
Currently, the logs show that tidb-gemkafk19g-tiflash-0.tidb-gemkafk19g-tiflash-peer.tidb-gemkafk19g.svc is always unable to connect, causing raft information to backlog in the channel and continuously logging. We need to investigate why it can’t connect.
I checked the relevant records, and the common issue causing OOM restarts was fixed after version 5.4.3. You can try upgrading the TiFlash version separately.
Generally, minor versions are bugfix versions, so upgrading TiFlash alone shouldn’t be a big problem. However, to be safe, after confirming that the issue is resolved, it is recommended to upgrade the entire TiDB to version 5.4.3.