Inconsistent index: table handle count 144 isn't equal to value count 140

translator_bot · June 21, 2024, 11:59pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: inconsistent index tbl handle count 144 isn’t equal to value count 140

| username: 普罗米修斯

[TiDB Usage Environment] Production Environment
[TiDB Version] v3.0.3
[Encountered Issues: Symptoms and Impact]

Yesterday, the TiKV server unexpectedly crashed. After restarting the server and bringing up the TiDB service, it indicated that two TiKVs were in a down state and could not be brought up. Checking the error log showed [Err] 9005 - Region is unavailable. Using the unsafe-recover command on all TiKVs, they all started up normally. A simple check of large tables showed they could be opened normally, so no further inspection was done.
This morning, the log showed an error during insertion. Using admin check table xx, the same [Err] 9005 - Region is unavailable error appeared. Checking the TiDB log, the error indicated an index inconsistency.

Image1689×656 26 KB
Checking the TiDB monitoring, there are still many down-regions, and the number is not decreasing significantly. What should be done now?

Image954×318 7.5 KB

translator_bot · June 21, 2024, 11:59pm

| username: 像风一样的男子 | Original post link

tikv-ctl has a function to check bad SST files. Try testing it.

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

Checked all-health

translator_bot · June 21, 2024, 11:59pm

| username: 像风一样的男子 | Original post link

Use pd-ctl region check down-peer to see which regions are down.

translator_bot · June 21, 2024, 11:59pm

| username: 像风一样的男子 | Original post link

Check if the region replicas are all on those two nodes. If both of those KV nodes have issues, it is recommended to scale down those problematic nodes.

translator_bot · June 21, 2024, 11:59pm

| username: tidb狂热爱好者 | Original post link

Your problem is that all the data is already corrupted, the index is corrupted, and the table is corrupted. Export whatever you can and back up whatever you can. Then rebuild.

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

I noticed that there is a down-peer on each store.

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

This down-peer doesn’t seem to be automatically resolving itself.

translator_bot · June 21, 2024, 11:59pm

| username: 像风一样的男子 | Original post link

Previously, I had a small number of SST files damaged, and I manually repaired them. However, your scale is too large, and the disk must have issues. If you can, back up the data first and prepare to rebuild.

translator_bot · June 21, 2024, 11:59pm

| username: tidb狂热爱好者 | Original post link

The issue won’t be resolved. The region you lost here has 90k data, all of which is corrupted.

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

I replied incorrectly earlier. Previously, I used tikv-ctl to check the bad-region, not the bad-ssts. I just checked and found that this toolkit command does not support checking bad-ssts.

I searched the TiDB logs and did not find any “region is unavailable” errors; the errors reported are all about index inconsistencies.

The TiKV logs report that region 15463620 has no leader issue, but I checked and found that there is no problem with this region.

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

miss-peer_region_count I see it is decreasing, but the speed is very slow.

translator_bot · June 21, 2024, 11:59pm

| username: tidb菜鸟一只 | Original post link

Find the corresponding table and repair the index;
ADMIN CLEANUP INDEX tbl idx; — Index is more than the table
ADMIN RECOVER INDEX tbl idx; — Index is less than the table

translator_bot · June 21, 2024, 11:59pm

| username: RenlySir | Original post link

Index inconsistency error, 数据索引一致性错误 | PingCAP 归档文档站

translator_bot · June 21, 2024, 11:59pm

| username: RenlySir | Original post link

The third step checks the consistency of data indexes. RawKV is not required.

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

When executing ADMIN CHECK INDEX, it reported [Err] 9005 - Region is unavailable.

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

Here are the current TiKV logs,

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

After switching to a higher version of the toolkit, the bad-sst error is different from the one you reported.

translator_bot · June 21, 2024, 11:59pm

| username: 普罗米修斯 | Original post link

check bad-sst reports the following error

corruption info:
/TiDBDisk1/deploy/data/db/5623506.sst: IO error: No such file or directory While opening a file for random read: /TiDBDisk1/deploy/data/db/5623506.sst: No such file or directory

sst meta:
sst 5623506 is not found in manifest: Error in processing file /TiDBDisk1/deploy/data/db/MANIFEST-5610025 NotFound: sst 5623506 is not in the live files set of the manifest

translator_bot · June 21, 2024, 11:59pm

| username: h5n1 | Original post link

Check pd-ctl region 15463620, the bad-sst check error is likely because those SST files are already gone.