Four TiKV nodes have similar store size and leader size, with three nodes having similar region sizes, but one node has significantly smaller region size and uses less memory

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 4个tikv节点store zise、leader大小差不多,有三个节点的region大小差不多、一个节点的region明显很小–内存使用也少

| username: devopNeverStop

[TiDB Usage Environment] Production Environment / Testing / Poc
Production Environment
[TiDB Version]
v6.1.0
[Reproduction Path] What operations were performed when the issue occurred
The node with very small regions had issues before, and after scaling down, it was scaled back up.
[Encountered Issue: Issue Phenomenon and Impact]
Can everyone help analyze the cause and how to resolve it to make the nodes more balanced?
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]


| username: tidb菜鸟一只 | Original post link

When was this node added for scaling?

| username: devopNeverStop | Original post link

It’s been about 10 days, and this status has been like this for about a week.

| username: h5n1 | Original post link

select store_id, address, region_weight from information_schema.tikv_store_status
| username: devopNeverStop | Original post link

Image

| username: 裤衩儿飞上天 | Original post link

  1. PD–region health monitoring post
  2. Is it a mixed deployment? Are there any labels?
| username: devopNeverStop | Original post link

It’s not mixed deployment.

| username: h5n1 | Original post link

You can try increasing the region weight of the store with fewer regions using the command pd-ctl store weight store_id xx xx. For specific command details, refer to the --help option.

| username: h5n1 | Original post link

From the available space perspective, the space usage is in a balanced state. The balancing of regions adjusts the storage space size rather than the number of regions. Some regions might be relatively large, so you can check the huge regions on the TiKV troubleshoot page.

| username: 裤衩儿飞上天 | Original post link

  1. Empty region 228K, remove the empty regions that can be removed.
  2. Learner 20.9K, it feels like it’s still balancing.
| username: 裤衩儿飞上天 | Original post link

I remember that in a previous version, only storage was considered when calculating the score. Later, in another version, the number of regions also started to affect the score. I can’t recall exactly which versions these were. :sleepy:

| username: devopNeverStop | Original post link

Does the “huge region” on the TiKV troubleshoot page refer to monitoring or official documentation?

| username: devopNeverStop | Original post link

I’ll look into it.

| username: h5n1 | Original post link

Do you frequently truncate or drop? Try extending the time for region health and see if there’s a downward trend.

| username: devopNeverStop | Original post link

Some calculation data tables are truncated daily.

| username: h5n1 | Original post link

This will generate a large number of empty regions every day. How long is the GC time?

| username: devopNeverStop | Original post link

The image is not available for translation. Please provide the text content for translation.

| username: 裤衩儿飞上天 | Original post link

Is cross-table region merging disabled?
enable-cross-table-merge

| username: devopNeverStop | Original post link

Enable cross-table merge

| username: h5n1 | Original post link

Extend the time range for region health to see if there is a downward trend or fluctuations in decline and rise. If there is, it indicates that the merging is working properly. Truncating every day cannot be avoided. Additionally, if there is TiFlash, it will affect the merging because TiFlash cannot share regions across tables.