The cluster has a total of 110,000 regions, of which nearly 40,000 are empty regions, and there has been no significant completion of merging for several months

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 集群总共 region 11w,其中 emtpy region接近4w,并且持续几个月都没明显完成 merge

| username: TiDBer_9Srg7cSk

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.1.0
[Reproduction Path]
[Encountered Problem: Phenomenon and Impact]
Adjusting various merge parameters has no significant effect.

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Log/Monitoring]

Configuration is as follows:

» config show
{
  "replication": {
    "enable-placement-rules": "true",
    "enable-placement-rules-cache": "false",
    "isolation-level": "",
    "location-labels": "",
    "max-replicas": 3,
    "strictly-match-label": "false"
  },
  "schedule": {
    "enable-cross-table-merge": "true",
    "enable-joint-consensus": "true",
    "high-space-ratio": 0.8,
    "hot-region-cache-hits-threshold": 3,
    "hot-region-schedule-limit": 4,
    "hot-regions-reserved-days": 7,
    "hot-regions-write-interval": "10m0s",
    "leader-schedule-limit": 4,
    "leader-schedule-policy": "count",
    "low-space-ratio": 0.9,
    "max-merge-region-keys": 300000,
    "max-merge-region-size": 20,
    "max-pending-peer-count": 64,
    "max-snapshot-count": 64,
    "max-store-down-time": "30m0s",
    "max-store-preparing-time": "48h0m0s",
    "merge-schedule-limit": 10,
    "patrol-region-interval": "10ms",
    "region-schedule-limit": 2048,
    "region-score-formula-version": "v2",
    "replica-schedule-limit": 64,
    "split-merge-interval": "1h0m0s",
    "tolerant-size-ratio": 0
  }
}

The data volume queried through the region table is: 12TB, and the actual disk usage is 7.1TB

MySQL [INFORMATION_SCHEMA]> select sum(APPROXIMATE_SIZE) from tikv_region_status;
+-----------------------+
| sum(APPROXIMATE_SIZE) |
+-----------------------+
|              12892933 |
+-----------------------+
1 row in set (3.81 sec)

Deployment Information

| username: h5n1 | Original post link

Are there frequent truncate or drop operations? There’s no TiFlash, right?

| username: tidb菜鸟一只 | Original post link

Manual compact?

| username: TiDBer_9Srg7cSk | Original post link

There is TiFlash, there was a truncate operation six months ago, but there shouldn’t be any recent ones.

| username: h5n1 | Original post link

Setting a table to TiFlash will prevent adjacent boundaries from being merged. If there is only one truncate, the merge speed seems to be very slow. Is the system busy? Is there high disk pressure? Besides the limit in pd-config, there is also a store limit that can affect the region scheduling speed. You can try adjusting the store limit using Pd-ctl.

| username: TiDBer_9Srg7cSk | Original post link

tilash disk IOPS are now averaging around 20,000 (the disk limit is 60,000, previously averaging 40,000 and the system was relatively normal), CPU and memory are quite sufficient. TiKV only has a few thousand IOPS.

Can regions with table_id as null and keys as 0 be excluded from being related to tiflash?

MySQL [INFORMATION_SCHEMA]> select count(*) from tikv_region_status where APPROXIMATE_KEYS=0 and TABLE_ID is null;
+----------+
| count(*) |
+----------+
|    18230 |
+----------+
1 row in set (4.45 sec)
| username: lilinghai | Original post link

Are there many small tables or many partitions? Additionally, what types of tables are set with TiFlash replicas?

| username: redgame | Original post link

If your TiDB cluster is constantly under high load, the Merge process may not have enough resources to operate.

| username: 像风一样的男子 | Original post link

Check the monitoring to see if the merge task is normal.

| username: TiDBer_9Srg7cSk | Original post link

There are quite a few partitions, and there are also larger TiFlash replicas related to partitioned tables.

| username: TiDBer_9Srg7cSk | Original post link

The performance of TiDB is indeed not as good as MySQL in some scenarios, but it is still acceptable. The main advantage of TiDB is its scalability and high availability. If your business needs to handle large-scale data and requires high availability, TiDB is a good choice.

| username: 像风一样的男子 | Original post link

The default store limit is 15 per minute. You can try increasing it appropriately.

| username: lilinghai | Original post link

TiFlash does not allow cross-table merge regions. If the number of relatively small partitions/tables with TiFlash replicas is roughly the same as the current number of empty regions, this is likely the limitation.

| username: 大飞哥online | Original post link

You can check the region information status by using select * from INFORMATION_SCHEMA.TIKV_REGION_STATUS.

| username: 大飞哥online | Original post link

There were a lot of operations on regions earlier, and a single region larger than 20M (default 20M) did not trigger merge. You can increase the max_merge_region_size and then change it back.

| username: TiDBer_9Srg7cSk | Original post link

Now, regions with a null table_id and larger than 50MB occupy 2.5TB of space. If they cannot be merged, is there a way to free up this space?

| username: TiDBer_9Srg7cSk | Original post link

Previously adjusted size and keys, but the effect was not good.

| username: jansu-dev | Original post link

Is there a cancel operation for create? If there is a cancel in the metrics, the reason for the cancel should be identifiable. Please provide a longer duration export of the PD metrics using the PingCAP MetricsTool.

| username: TiDBer_9Srg7cSk | Original post link

The script doesn’t work, manual screenshot

| username: cy6301567 | Original post link

Will empty ones merge automatically?