The RocksDB thread pool is used for Compact and Flush tasks in RocksDB, and usually does not need to be configured.
If the machine has a small number of CPU cores, you can set rocksdb.max-background-jobs and raftdb.max-background-jobs to 4.
If you encounter a Write Stall, you can check the Write Stall Reason indicators in RocksDB-kv on Grafana monitoring to see which indicators are not zero.
If it is caused by pending compaction bytes, you can set rocksdb.max-sub-compactions to 2 or 3 (this configuration indicates the number of sub-threads allowed for a single compaction job, with a default value of 3 in TiKV 4.0 and 1 in version 3.0).
If the reason is related to memtable count, it is recommended to increase the max-write-buffer-number for all columns (default is 5).
If the reason is related to level0 file limit, it is recommended to increase the following parameters to 64 or higher:
How many CPU cores does it have?
If it has more than 10 cores, it might make sense to adjust max-background-jobs. Because with more than 10 cores, according to the formula, this value can be up to 9. If it has fewer than 10 cores, there might not be much need for adjustment.
Additionally, whether you have encountered a Write Stall is another consideration. There can be many reasons for this.
The RocksDB thread pool only handles flush (persisting memtable) and compact (compression, which may involve adjustments to many SST files), implying that your disk might be slow. If the RocksDB CPU usage is high, you should also check whether the disk I/O is sufficient before considering adjusting the max-background-jobs parameter. Otherwise, adding more threads might have limited effect.
Is the disk performance still slow? But the utilIO is not reaching 100. Is it because of the network overhead from using FC SAN, so there’s no solution?
It seems like there is still some IO capacity left. Increasing the thread pool might be useful. But the question is, are there any specific issues appearing?
If there is no Write Stall, and the pending compaction bytes graph is decreasing, and there are no batch insert slow queries in the slow query monitoring,
I even think that just the RocksDB CPU graph alone is not sufficient reason to adjust parameters.
Because there are no specific issues, any adjustments made might have negligible effects.
There’s nothing much, just the overall speed is not fast. This thread’s CPU seems to be fully loaded all the time, and I want to reduce it. So there’s no need to adjust it? When should I adjust it in response to a write stall?
Overall optimization is a big topic. I suggest you watch this video. It can link many monitoring points together and provide an idea and direction for tuning.
If you just observe that something is full and then adjust it, I think it doesn’t make much sense.
As an exploratory or learning-oriented operation, I agree with it.
But in a production environment, if the correlation is uncertain, it might easily cause some other troubles.