Consultation on Performance Issues When the Number of Regions on a TiKV Node Exceeds 50,000

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于一个tikv节点region数量超5万影响性能问题请教

| username: wenyi

The official documentation states that the number of regions on a single TiKV node should not exceed 50,000. In my test cluster with 300GB of data, the number of regions is 20,000. In production with 3TB of data, the number of regions on a single TiKV node is close to 200,000. Will this affect performance?

| username: dba远航 | Original post link

It is recommended to increase the REGION size or add more TIKV nodes.

| username: wenyi | Original post link

How many TiKV nodes do you usually have for producing 5TB of data?

| username: tidb菜鸟一只 | Original post link

Refer to the optimization here: 海量 Region 集群调优最佳实践 | PingCAP 文档中心
The main issue is that PD needs to record all the region information and keep refreshing it. If there are too many regions, it can easily cause PD to fail to obtain region information in a timely manner.

| username: 托马斯滑板鞋 | Original post link

Are there 22,000 regions for just 300GB? Are there many tables? :thinking:

| username: chenhanneu | Original post link

I looked at a set of libraries for operations and maintenance, with a single KV over 300GB and corresponding to 40k regions per single KV. Is this normal? There are fewer than 100 tables.

| username: wenyi | Original post link

Around 500 tables.

| username: wenyi | Original post link

Your cluster is approaching 50,000 regions.

| username: xingzhenxiang | Original post link

Compared to mine, the data is much less.

| username: 托马斯滑板鞋 | Original post link

Why do you have so many regions? Based on a single region being 96MB, it should be less than 5k :joy: (unless there are many separate tables).

| username: TiDBer_jYQINSnf | Original post link

300G of data, 50,000 regions, was this done on purpose?
Our production cluster has around 1.5TB and only has forty to fifty thousand regions.
Having too many regions leads to excessive heartbeats and raft messages, wasting a lot of resources.

| username: Jellybean | Original post link

In my impression, clusters after v4.0 have the silent region feature enabled by default, which reduces the frequency of heartbeat messages between regions. Therefore, as long as the entire cluster’s data is not being read and written frequently, the overall cost of maintaining tens of thousands of heartbeats should still be relatively low.

If you are still concerned, you can increase the size of the regions, which is 96MB by default. After increasing it, the number of regions will decrease while the cluster size remains basically unchanged.

| username: zhanggame1 | Original post link

Setting the region size to 256m is sufficient.

| username: caiyfc | Original post link

First, check if there are many empty regions. Secondly, check the CPU usage of the PD leader. If it’s not high, the impact is not significant.

| username: 路在何chu | Original post link

This is about right. I checked our environment, and it’s about the same.

| username: zhaokede | Original post link

The value given by the official is a recommended value. Increasing it will definitely have an impact on performance.

| username: 数据库真NB | Original post link

The official recommendations are not as relevant as your actual stress and real situation.

| username: zhanggame1 | Original post link

It also depends on the configuration. Higher server configuration can increase the upper limit.

| username: 春风十里 | Original post link

The main thing to look at is the performance pressure on the PD nodes. You can also check if the reason for having many regions is due to a large number of empty regions. Before version 5.0, TiDB had the cross-table region merge feature disabled by default. Starting from version 5.0, TiDB enables the cross-table region merge feature by default to reduce the number of empty regions and lower the system’s network, memory, and CPU overhead. You can check the schedule.enable-cross-table-merge configuration item.

| username: YuchongXU | Original post link

The more performance, the better. Specifically, test it based on the configuration.