[TiDB Usage Environment] Production Environment
[TiDB Version] V7.1.0
[Encountered Problem: Problem Phenomenon and Impact] In the absence of read requests, during the full data migration process using DTS, it was found that the memory of TIKV continuously increased [checked by logging into the TIKV server and using TOP]. By modifying the storage.block-cache.capacity configuration, the memory usage was reduced somewhat, but the issue is still not resolved. How can we identify what is occupying this memory?
TiKV memory is resident, and theoretically, without configuration, a single machine with a single instance will use 80% of the server’s memory and then will not increase further.
TiKV generally won’t OOM. Either you have multiple instances deployed without memory configuration, or the memory configuration is unreasonable, or it’s a TiKV bug causing a memory overflow.
My cluster configuration is very poor (4c8G), and I have encountered this situation. Several TiKVs took turns crashing, but after implementing resource control in version 7.1, it has been running smoothly.
It’s not complicated to use. I’m not sure if it’s a configuration issue or a mixed deployment.
My issue is definitely due to insufficient configuration. The recommended configuration in the documentation doesn’t even cover such a low-end setup like mine. It’s impossible to use without this feature. This is just a record; you can take a look if you’re interested.
Is there a detailed explanation for the parameter storage.block-cache.strict-capacity-limit? I couldn’t find it in the official documentation. Can we strictly limit TiKV memory usage by using this parameter along with the storage.block-cache.capacity parameter?
Not very optimistic, but you can try it in a test environment. This parameter is just a true/false setting. It seems to be directly passed into the RocksDB settings.
Then, in RocksDB, I found this issue:
The gist is that if this parameter is set and the cache is full, it won’t be able to insert anymore and might immediately throw an error instead of pretending nothing happened and returning normally. This issue is still open.
Since RocksDB will clearly throw an error, it’s unclear whether TiKV has done anything to handle this situation.