Issues Related to PD Scheduling

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: PD 调度相关问题

| username: Raymond

I would like to ask, what does it mean when PD scores TiKV?
Also, why does the existence of hotspot regions lead to uneven distribution of leaders/regions?

| username: ddhe9527 | Original post link

You can view the Store Region Score on the PD → Balance page in Grafana, which is the score given by PD to each store. The balance-region score calculation is a piecewise function:

  • When there is ample space, the score is calculated based on the amount of data. This threshold is controlled by PD’s high-space-ratio, with a default value of 0.7, meaning that a space usage rate below 70% is considered ample space.

  • When space is insufficient, the score is calculated based on the remaining space. This threshold is controlled by low-space-ratio, with a default value of 0.8, meaning that a space usage rate above 80% is considered insufficient space.

  • When space usage is between the above two thresholds, both the amount of data and the remaining space are weighted in the scoring.

In addition to the amount of data and remaining space, the score calculation is also influenced by the region-weight of the Store. The higher the weight, the lower the score, and the more Regions there are on the Store.

Hot Regions will trigger the hot-region-scheduler, which has a different scheduling logic from the balance-region-scheduler. It is based on the read/write traffic of the Region rather than the region data, so it is possible for the distribution of Regions on the Store to be uneven.

| username: cs58_dba | Original post link

Watch PCTP videos and try to keep hardware configurations consistent. This way, the scoring criteria will be simpler.

| username: 箱子NvN | Original post link

TiKV sends heartbeat information to PD, which includes capacity, remaining space, and read/write traffic. Regions send heartbeat information to PD, which includes replica distribution, data volume, and read/write traffic. PD uses this information to score TiKV.

For example, if a new TiKV node joins the cluster and it has nothing inside, PD will assign some region leaders to it, which will generate read/write traffic. It will also assign region followers to help store replicas, thus utilizing its capacity. This situation leads to an uneven distribution, and PD will adjust the cluster accordingly.

| username: 箱子NvN | Original post link

https://learn.pingcap.com/learner/course/960001 Recommended online course address Lesson 04 Placement Driver in this section

| username: Raymond | Original post link

If a certain TiKV has a higher score, does that mean PD is more likely to schedule more regions to this TiKV?

| username: ddhe9527 | Original post link

The fewer the stores, the more PD tends to schedule regions onto them.

| username: cs58_dba | Original post link

A feeling of water flowing downhill.

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.