Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tikv上region数量多少合适
[TiDB Usage Environment] Production Environment / Testing / PoC
4.0.13
[Reproduction Path] What operations were performed when the issue occurred
What is the appropriate number of regions, and when should expansion be considered?
This doesn’t have specific data; it still depends on the business. A region is 96M by default. If there are too many regions, there will be more heartbeat checks with PD, leading to more resource interactions. If the disk space is insufficient, expansion will be necessary.
It seems like there are no specific data, so it depends on your resource situation.
The number of regions should not be related; it is related to the used space size of the TiKV node, which should not exceed 2TB.
It is generally not recommended to have more than 20,000 regions on a single TiKV node, as exceeding this number may lead to performance degradation.
It depends on your financial strength. If you have money, you can add as many KVs as you want.
This works, hahaha, money power.
Currently, each TiKV has approximately 25,000 regions.
It’s because there’s no money and they’re unwilling to scale up, hahaha.
You can talk to the business team and archive some unused data before deleting it to save costs.
How much disk space does the 25k region occupy?
Currently scaling down, will check after it’s done tomorrow.
As the number of regions increases, does the memory usage of TiKV also gradually increase to maintain so many regions? How do you handle this memory alarm in the end? Have you encountered this situation before?
We have 128GB of memory, with a usage rate of about 50%, and we haven’t encountered this issue. If it really doesn’t work, consider adding more memory.
There is a table with over 500GB of data, preparing to clean it up.
You can view the relevant monitoring metrics under the TiKV panel in Grafana. Check the Raft store CPU
under Thread-CPU to see if it has reached a bottleneck. If it exceeds 85%, it is recommended to first adjust using the following strategies before considering expanding TiKV!
- If I/O resources and CPU resources are relatively sufficient, you can deploy multiple TiKV instances on a single machine to reduce the number of Regions on a single TiKV instance.
- Reduce the number of messages per unit time in the Region to reduce the pressure on the Raftstore.
- Increase the concurrency of Raftstore.
- Enable the Hibernate Region feature.
- Enabling
Region Merge
can also reduce the number of Regions. Contrary to Region Split
, Region Merge
is the process of merging adjacent small Regions through scheduling. After deleting data in the cluster or executing Drop Table
/Truncate Table
statements, small or even empty Regions can be merged to reduce resource consumption.
- The default size of a Region is about 96 MiB; increasing it can also reduce the number of Regions.
Increasing the region size is too risky, I don’t dare to do it. The CPU resources are definitely sufficient, but the insert statements are unstable. Some take tens of milliseconds, some take hundreds of milliseconds, and some even take up to a minute. The application cannot tolerate too many 300-millisecond delays.
However, after adding a TiKV node yesterday, the number of SQL queries taking more than 300ms has significantly decreased. Having fewer regions definitely has its benefits.
The default is 96M, which is generally applicable.