Trigger TiKV_approximate_region_size alert, manual split timeout

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 触发TiKV_approximate_region_size告警,手动split超时

| username: TiDB_C罗

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
Triggered TiKV_approximate_region_size alert, monitoring as follows:
PD


TiKV-Details

TiKV-Trouble-Shooting

Check Region

All concentrated on the same table

Manual split
tiup ctl:v6.5.2 pd -u http://xx.xx.xx.xx:2379 operator add split-region 417438
You can see PD Dashboard


There are corresponding Create, Check, Timeout
Question 1: What causes the region not to split automatically?
Question 2: Why does the manual split timeout?

| username: zhanggame1 | Original post link

Is this table related to business? What are its characteristics?

| username: TiDB_C罗 | Original post link

There is a JSON field in this table, nothing else special.

| username: zhanggame1 | Original post link

Take a look at the APPROXIMATE_KEYS in INFORMATION_SCHEMA.TIKV_REGION_STATUS as well.

| username: TiDB_C罗 | Original post link

This is the picture.

| username: dba远航 | Original post link

Consider the parameters shard_row_id_bits and pre_split_regions.

| username: tidb菜鸟一只 | Original post link

How large is a JSON…

| username: TiDB_C罗 | Original post link

The sizes are different. I randomly checked some, the largest is 1,455,120, and the smallest is 35,233.

| username: tidb菜鸟一只 | Original post link

The command operator add split-region 1 --policy=approximate splits the region in this way, which will be a rough split and might be faster. Additionally, you can try splitting a smaller region first to see if it times out or not.

| username: TiDB_C罗 | Original post link

I initially added --policy=approximate, but later didn’t add it and got the same result. I’ll try using smaller regions to see if it works.

| username: TiDB_C罗 | Original post link

This is the log of the automatic split failure:
[2024/02/05 23:42:35.657 +00:00] [INFO] [size.rs:202] [“Run size checker”] [policy=Approximate] [threshold=150994944] [size=1797417437] [region_id=417290]
[2024/02/05 23:42:35.657 +00:00] [INFO] [range_properties.rs:130] [“range size is too large”] [cf=default] [ssts_size=“5916496.sst:270598017, 5931357.sst:208660678, 5930493.sst:12131525, 5931316.sst:12832681, 5932616.sst:12689300, 5932848.sst:11735428, 5933096.sst:11280137, 5933324.sst:11333861, 5933929.sst:10920084, 5931769.sst:11980596, 5934186.sst:11492591, 5930921.sst:11778132, 5934437.sst:11546315, 5934665.sst:11600039, 5934903.sst:12715860, 5935272.sst:12246028, 5935992.sst:11339664, 5936221.sst:11388615, 5936467.sst:12529411, 5936960.sst:12646849, 5937176.sst:12151931, 5937585.sst:12820453, 5947314.sst:12651302, 5933701.sst:11387585, 5955335.sst:11020170, 5940824.sst:13586353, 5947709.sst:12733553, 5947511.sst:13360339, 5956266.sst:12597085, 5948065.sst:12136293, 5946977.sst:13225030, 5948259.sst:11495077, 5948436.sst:10848200, 5956073.sst:11088435, 5946025.sst:12366587, 5939435.sst:13241698, 5949768.sst:11032904, 5950484.sst:11849833, 5945617.sst:12927550, 5952983.sst:12901527, 5950662.sst:11881912, 5950288.sst:11817754, 5951752.sst:12021550, 5951220.sst:11231816, 5946279.sst:9791400, 5930707.sst:12200900, 5953503.sst:11563928, 5951404.sst:10555635, 5931535.sst:12411800, 5951572.sst:10580610, 5942933.sst:12343690, 5932415.sst:13123279, 5951985.sst:12053629, 5950117.sst:11785675, 5941280.sst:13754851, 5935775.sst:12363466, 5956455.sst:12629164, 5938249.sst:10717349, 5937374.sst:12761734, 5955890.sst:11800136, 5956283.sst:18921968, 5944095.sst:13192809, 5938036.sst:10677278, 5954872.sst:11700680, 5942706.sst:12913089, 5938599.sst:11324710, 5947883.sst:12100329, 5953702.sst:12317809, 5949942.sst:11753596, 5953315.sst:13695257, 5940977.sst:11861950, 5939226.sst:11466790, 5955086.sst:10995195, 5953151.sst:12937491, 5947143.sst:13936398, 5942134.sst:13372931, 5952813.sst:12151753, 5946826.sst:12522653, 5941610.sst:13265483, 5935506.sst:12304747, 5945850.sst:13621713, 5945425.sst:13526142, 5945202.sst:12195758, 5943349.sst:11809811, 5943138.sst:12388090, 5945030.sst:13432902, 5954658.sst:11672264, 5944678.sst:13337331, 5944491.sst:13288380, 5937832.sst:11195950, 5952253.sst:9239126, 5942495.sst:12252670, 5943869.sst:12519070, 5936769.sst:12588130, 5943527.sst:12474670, 5944280.sst:12610090, 5942308.sst:13426655, 5939830.sst:13359136, 5939635.sst:13300417, 5931989.sst:12547775, 5946667.sst:13138450, 5941776.sst:13319207, 5941445.sst:13211759, 5941127.sst:13696132, 5940236.sst:12886973, 5944848.sst:12747730, 5940021.sst:12833249, 5939031.sst:12562187, 5938829.sst:12508463”] [memtable=49752] [total_size=1797410892] [end=7A7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start=7A7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.658 +00:00] [INFO] [peer.rs:5550] [“on split”] [source=“split checker”] [split_keys=“10 keys range from 7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FAF9CA1257DCD3FFF0 to 7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FAF9CA213E3963FFF9”] [peer_id=417293] [region_id=417290]
[2024/02/05 23:42:35.658 +00:00] [INFO] [pd.rs:1082] [“try to batch split region”] [task=batch_split] [region=“id: 417290 start_key: 7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA end_key: 7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA region_epoch { conf_ver: 5 version: 3894 } peers { id: 417291 store_id: 1 } peers { id: 417292 store_id: 2 } peers { id: 417293 store_id: 3 }”] [new_region_ids=“[new_region_id: 499054 new_peer_ids: 499055 new_peer_ids: 499056 new_peer_ids: 499057, new_region_id: 499058 new_peer_ids: 499059 new_peer_ids: 499060 new_peer_ids: 499061, new_region_id: 499062 new_peer_ids: 499063 new_peer_ids: 499064 new_peer_ids: 499065, new_region_id: 499066 new_peer_ids: 499067 new_peer_ids: 499068 new_peer_ids: 499069, new_region_id: 499070 new_peer_ids: 499071 new_peer_ids: 499072 new_peer_ids: 499073, new_region_id: 499074 new_peer_ids: 499075 new_peer_ids: 499076 new_peer_ids: 499077, new_region_id: 499078 new_peer_ids: 499079 new_peer_ids: 499080 new_peer_ids: 499081, new_region_id: 499082 new_peer_ids: 499083 new_peer_ids: 499084 new_peer_ids: 499085, new_region_id: 499086 new_peer_ids: 499087 new_peer_ids: 499088 new_peer_ids: 499089, new_region_id: 499090 new_peer_ids: 499091 new_peer_ids: 499092 new_peer_ids: 499093]”] [region_id=417290]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=0] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=1] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=2] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=3] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=4] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=5] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=6] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=7] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=8] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [WARN] [split_observer.rs:38] [“skip invalid split key: key is not in region”] [index=9] [end_key=7480000000000001FF7B5F7282CA33BAADFFBAA48E0000000000FA] [start_key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA] [region_id=417290] [key=7480000000000001FF7B5F7282CA33BAADFFB653B70000000000FA]
[2024/02/05 23:42:35.659 +00:00] [ERROR] [split_observer.rs:142] [“failed to handle split req”] [err=“"no valid key found for split."”] [region_id=417290]
[2024/02/05 23:42:35.659 +00:00] [WARN] [peer.rs:4339] [“skip proposal”] [error_code=KV:Raftstore:Coprocessor] [err=“Coprocessor(Other("[components/raftstore/src/coprocessor/split_observer.rs:147]: no valid key found for split."))”] [peer_id=417293] [region_id=417290]

| username: TiDB_C罗 | Original post link

I tried splitting slightly larger regions on other tables, and it completed quickly. However, for the regions on this table, the logs show that there are always split tasks, but they fail. Manual operations also time out.

| username: tidb菜鸟一只 | Original post link

How large is this region 417290? I don’t see this region in your screenshot. This region might have indeed triggered a bug, causing the automatic split process to malfunction, and as a result, none of the regions can automatically split.

| username: TiDB_C罗 | Original post link

When I performed the split operation at 648M, I observed that the space continued to increase instead of splitting.

| username: 有猫万事足 | Original post link

It feels like a bug. The above log indicates that the split key is the start key of this series of SST files.

Then, when the code checks if this key is outside a series of SST files

start_key < key && (key < end_key || end_key.is_empty())

It will always consider this split key to be outside this series of SST files. So, it will never find the split key.

| username: tidb菜鸟一只 | Original post link

My suggestion is to manually split the other regions first to see if only this region has a problem. If so, you can try rebuilding the corresponding table.

| username: TiDB_C罗 | Original post link

All regions of a certain table are not working.

| username: tidb菜鸟一只 | Original post link

Is it the table with the JSON field? Try creating a new table and transferring some data from the original table to see if you can reproduce the issue.

| username: TiDB_C罗 | Original post link

The table structure is as follows:
CREATE TABLE a (
account_id bigint(20) NOT NULL DEFAULT ‘0’ COMMENT ‘User ID’,
response json DEFAULT NULL COMMENT ‘Response result’,
create_time bigint(20) unsigned NOT NULL DEFAULT ‘0’ COMMENT ‘create_time’,
type tinyint(4) DEFAULT NULL COMMENT ‘Type (0 real-time, 1 offline)’,
PRIMARY KEY (account_id) /*T![clustered_index] CLUSTERED */,
KEY idx_create_time (create_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT=‘Result table’
I will verify it to see if I can reproduce it.

| username: kkpeter | Original post link

It seems like this is a bug, the key and start_key are equal.