[TiDB Usage Environment] Production Environment / Testing / POC
Production Environment
[TiDB Version]
v6.1.5
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
What data does TiFlash need to load when starting, and why does it require such a large amount of memory? Are there any parameters to limit the memory consumption during startup?
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]
Based on the current situation described by the original poster, it is possible that a large amount of data migration or SQL execution occurred when TiFlash started, resulting in high IO and high memory usage.
The information is a bit limited. Could you please upload the entire TiFlash log file after desensitization, and also describe the number of regions in TiKV and the data synchronization status of the tables to TiFlash?
Additionally, please confirm whether there was any data being synchronized to the TiFlash tables at the time of startup.
It wasn’t a scaling operation. I manually stopped TiFlash for about 10 minutes, then restarted the operating system, and an OOM (Out of Memory) error occurred.
If you have extra resources, you can try scaling up directly. My feeling is that TiFlash doesn’t consume much memory when writing, but it’s hard to say for reading.
It’s best to act as if TiFlash doesn’t exist and see if the execution plan can run without using the TiFlash engine.
However, I checked and found that this parameter for selecting the query engine does not support global modification.
You may need to modify the parameter in the configuration file online through set config.
The parameter name is as follows,
Change the above parameter to [“tikv”, “tidb”], which means that TiFlash does not exist, and queries do not use TiFlash.
Then try restarting TiFlash to see if it can come up. If it does, change the parameter back.
If it doesn’t come up, it might really be that there is too much data to synchronize. You may need to remove some TiFlash replicas of tables to reduce some synchronization write pressure and try again.