After TiKV lossy recovery, some table DDLs show the table does not exist, but DML operations are normal

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv 有损恢复之后个别表DDL显示表不存在,DML正常

| username: porpoiselxj

[TiDB Usage Environment] Testing
[TiDB Version] v7.1.3
[Reproduction Path]
The cluster has a total of 10 kv nodes, distributed across 5 machines, with each machine hosting 2 kv nodes (same label). Yesterday, within a short span of 3 hours, 3 kv disks became unrecognizable, causing the instances to go offline and unable to recover.

[Encountered Problem: Phenomenon and Impact]
After performing a lossy recovery, an admin check was conducted on the problematic table. It was found that the table’s data could be queried and updated, and DML operations were normal. The table could also be found in information_schema.tables.

However, all DDL operations (including truncate, drop, rename, etc.) displayed the error that the table does not exist:
Error Code: 1146. Table ‘(Schema ID ***).(Table ID ***)’ doesn’t exist

Please help.

| username: tidb狂热爱好者 | Original post link

The table must be corrupted, right?

| username: porpoiselxj | Original post link

Lossy recovery, quite a few regions are damaged, but since it’s a test site, it doesn’t matter much. The key issue now is that this table cannot be cleaned up.

| username: tidb狂热爱好者 | Original post link

Export the data to CSV and create a new import.

| username: porpoiselxj | Original post link

The problem is not with the data; I have cleared all the data, and that’s fine. The issue now is that I can’t execute DDL commands on this table.

| username: tidb狂热爱好者 | Original post link

Check if the new table you created can perform DDL operations. If the new one can, then just move the data.

| username: zhaokede | Original post link

During the recovery process, the table’s metadata may be lost or inconsistent due to certain reasons (such as incomplete backup, errors during recovery, etc.).

| username: porpoiselxj | Original post link

There is no problem with other tables, the key is that I need to use this table name.

| username: porpoiselxj | Original post link

There should be some inconsistency in the metadata. Now I want to know how to manually fix it.

| username: FutureDB | Original post link

It seems that we can only report the issue to the vendor and ask them for a solution.

| username: WinterLiu | Original post link

The table’s metadata is probably corrupted, but I don’t know how to fix it…

| username: TiDBer_RjzUpGDL | Original post link

The metadata is corrupted…

| username: 鱼跃龙门 | Original post link

Metadata is corrupted, directly delete the metadata and then rebuild the table?

| username: WalterWj | Original post link

Try to perform a logical backup and then rebuild the cluster.

| username: 小于同学 | Original post link

This can’t be recovered, right?

| username: 友利奈绪 | Original post link

This issue can’t be manually adjusted, right? Ask the vendor experts.

| username: zhaokede | Original post link

First, secure the data by logically backing up all of it. Whether you rebuild the cluster or repair the metadata later, the data will be protected.

| username: 霸王龙的日常 | Original post link

Is it possible that the physical file corresponding to the table is damaged? First, back up the data. If you want to solve the problem quickly, you can create a new table to restore the data and ensure that business operations are not affected.

| username: YuchongXU | Original post link

Restored again.

| username: tidb菜鸟一只 | Original post link

You should logically back up all the tables and then set up the cluster again for import. Generally, for clusters with data loss recovery, this is the standard procedure. You wouldn’t directly use the recovered cluster, right? Isn’t it enough to retrieve the data?