What is the region placement algorithm like?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: region 位置放置算法是什么样子的?

| username: 数据库菜鸡

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] v7.1.0
[Reproduction Path] Deployed one TiKV on each of three large-disk machines and wrote data, then added one small-disk TiKV to each.
[Encountered Problem: Issue Phenomenon and Impact]
Did not observe regions migrating from large disks to small disks. Is this normal logic? Thank you.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]

| username: Billmay表妹 | Original post link

You can check out this document: TiDB 数据库的存储 | PingCAP 文档中心

| username: Billmay表妹 | Original post link

You can also take a look at this: TiDB 数据库的调度 | PingCAP 文档中心

| username: zhanggame1 | Original post link

How long did you wait? This kind of scheduling is very slow. Additionally, disk capacity also affects the scheduling strategy.

| username: 数据库菜鸡 | Original post link

Can the disk capacity be set?

| username: 数据库菜鸡 | Original post link

It was only a few hours later that I discovered the migration was successful. It seems that the new region was placed on top.

| username: zhanggame1 | Original post link

The disk capacity is inherent to the disk itself and not something that can be set. Your migration is indeed very slow.

| username: Jellybean | Original post link

In order not to affect online access, the region scheduling and balancing strategy within TiKV is relatively mild and conservative. This is also related to factors such as disk size, data volume, data distribution, and access hotspots. These factors all affect the scoring of node scheduling, meaning they are all influencing factors for region migration.

If you want to quickly observe data migration, you can accelerate this process by modifying the scheduling parameters. However, this may consume a large amount of cluster resources, which needs to be noted. You should pay attention to changes in access latency and other metrics when adjusting parameters to avoid impacting normal business access.

| username: h5n1 | Original post link

When using disks of different sizes together, it is best to set the leader/region weight using pd-ctl store xxxxx to ensure they maintain similar utilization rates.

| username: redgame | Original post link

The current load situation and data distribution may lead PD to believe that there is no need for Region migration, as the cluster load is already relatively balanced.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.