How the TiDB PDServer Module Stores Region Metadata Topology

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB PDSERVER模块是如何存储REGION元信息拓扑

| username: residentevil

[TiDB Usage Environment] Production Environment
[TiDB Version] V6.1.7
[Encountered Problem: Phenomenon and Impact] PDServer stores the topology information of REGIONS in the entire TiKV. If the cluster size is very large, such as 100TB, the corresponding number of REGIONS will be in the millions. Here, I have two questions to consult:

  1. How is the topology of REGION metadata stored in PDServer implemented? [REGION storage strategy in STORE]
  2. Will PDServer encounter performance bottlenecks when the number of REGIONS exceeds millions, or can performance issues be resolved by optimizing parameters?
| username: Fly-bird | Original post link

PD has a database etcd, which stores the location, number of replicas, and relationships with other REGIONS for each REGION. It provides the following information:

  1. REGION ID: The unique identifier for each REGION.
  2. Storage Information: Describes the physical location and distribution of replicas for the REGION.
  3. Parent REGION ID: Specifies the parent REGION of the current REGION.
  4. Child REGION ID List: Specifies all child REGIONS of the current REGION.
  5. Replica Information: Describes the status and synchronization of each REGION’s replicas.
| username: 像风一样的男子 | Original post link

PD stores Region Meta information in the local LevelDB and synchronizes information between PD nodes through other mechanisms. LevelDB is an efficient key-value embedded database written in C++, and it currently has very good read and write performance for data at the billion level.

| username: 喵父666 | Original post link

Come in and learn a bit.

| username: residentevil | Original post link

If the number of REGIONS exceeds one million, won’t there be performance issues?

| username: residentevil | Original post link

Where is LEVELDB?

| username: 像风一样的男子 | Original post link

You can find it in the local PD files.

| username: residentevil | Original post link

It seems that PD indeed has its own database, and this database is stored in KV format. So it appears that the storage of region information is indeed done this way.

| username: residentevil | Original post link

For a scale of over 5 million regions, I’m not sure if there will be performance issues with PD scheduling.

| username: zhanggame1 | Original post link

If the data volume is too large, there may be performance issues. You can adjust the region size to reduce the number.

| username: 像风一样的男子 | Original post link

A data volume of 5 million is no problem for MySQL, and such a KV database can handle it with ease.

| username: residentevil | Original post link

I said it’s a single replica REGION=5 million, haha.

| username: residentevil | Original post link

By the way, when I was testing version V6.1.7 recently, I noticed that the health of the table in the statistics changed very frequently [actually, the online write volume is less than 1000, and there are only INSERT+DELETE]. After running it overnight, I found that it dropped from 99 to 49 :sweat_smile:. It seems that I need to start analyzing the source code related to the table’s health.

| username: xingzhenxiang | Original post link

TiDB v3.1.0 with over 2 million regions, no pressure at all.

| username: residentevil | Original post link

My Current Storage Size is close to 100TB.

| username: 像风一样的男子 | Original post link

Is the data volume that large already? Is it possible to archive some historical data from the business side?

| username: residentevil | Original post link

We actually want to test the capacity limit of TiDB and determine the performance inflection point after the cluster size reaches XXTB.

| username: TiDBer_小阿飞 | Original post link

I previously read an article about region scheduling. I’m not sure if it will be useful to you.
https://zhuanlan.zhihu.com/p/367859688