Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TIDB PDSERVER模块是如何存储REGION元信息拓扑
[TiDB Usage Environment] Production Environment
[TiDB Version] V6.1.7
[Encountered Problem: Phenomenon and Impact] PDServer stores the topology information of REGIONS in the entire TiKV. If the cluster size is very large, such as 100TB, the corresponding number of REGIONS will be in the millions. Here, I have two questions to consult:
- How is the topology of REGION metadata stored in PDServer implemented? [REGION storage strategy in STORE]
- Will PDServer encounter performance bottlenecks when the number of REGIONS exceeds millions, or can performance issues be resolved by optimizing parameters?
PD has a database etcd, which stores the location, number of replicas, and relationships with other REGIONS for each REGION. It provides the following information:
- REGION ID: The unique identifier for each REGION.
- Storage Information: Describes the physical location and distribution of replicas for the REGION.
- Parent REGION ID: Specifies the parent REGION of the current REGION.
- Child REGION ID List: Specifies all child REGIONS of the current REGION.
- Replica Information: Describes the status and synchronization of each REGION’s replicas.
PD stores Region Meta information in the local LevelDB and synchronizes information between PD nodes through other mechanisms. LevelDB is an efficient key-value embedded database written in C++, and it currently has very good read and write performance for data at the billion level.
If the number of REGIONS exceeds one million, won’t there be performance issues?
You can find it in the local PD files.
It seems that PD indeed has its own database, and this database is stored in KV format. So it appears that the storage of region information is indeed done this way.
For a scale of over 5 million regions, I’m not sure if there will be performance issues with PD scheduling.
If the data volume is too large, there may be performance issues. You can adjust the region size to reduce the number.
A data volume of 5 million is no problem for MySQL, and such a KV database can handle it with ease.
I said it’s a single replica REGION=5 million, haha.
By the way, when I was testing version V6.1.7 recently, I noticed that the health of the table in the statistics changed very frequently [actually, the online write volume is less than 1000, and there are only INSERT+DELETE]. After running it overnight, I found that it dropped from 99 to 49
. It seems that I need to start analyzing the source code related to the table’s health.
TiDB v3.1.0 with over 2 million regions, no pressure at all.
My Current Storage Size is close to 100TB.
Is the data volume that large already? Is it possible to archive some historical data from the business side?
We actually want to test the capacity limit of TiDB and determine the performance inflection point after the cluster size reaches XXTB.
I previously read an article about region scheduling. I’m not sure if it will be useful to you.
https://zhuanlan.zhihu.com/p/367859688