TiFlash Table Query Hint 9012: TiFlash Server Timeout

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash 表查询提示9012 tiflash server timeout

| username: 济南小老虎

[TiDB Usage Environment] Poc
[TiDB Version] 6.5.3
[Reproduction Path] In a single replica environment, I scaled in a TiKV node and used the recreation regions method to achieve lossy data recovery. However, there was an issue with TiFlash.
My handling process is mainly as follows:

[Encountered Issue: Problem Phenomenon and Impact] Some TiFlash tables cannot query results, showing a timeout, while some tables can be queried normally.
[Resource Configuration] Kunpeng 920 48c *2 512G memory, four servers, 16 TiKV nodes, 3 TiFlash nodes
[Attachments: Screenshots/Logs/Monitoring]
[root@clickhouse1 log]# pd-ctl store |grep ‘“id”:’
“id”: 1,
“id”: 226277,
“id”: 258382,
“id”: 258385,
“id”: 258496,
“id”: 10973513,
“id”: 2,
“id”: 226275,
“id”: 226276,
“id”: 226278,
“id”: 258383,
“id”: 258498,
“id”: 258899,
“id”: 258495,
“id”: 258898,
“id”: 91,
“id”: 92,
“id”: 417651984,
[root@clickhouse1 log]# tail -f tiflash_tikv.log
[2023/09/18 11:26:20.396 +08:00] [ERROR] [peer.rs:5243] [“failed to send extra message”] [err_code=KV:Raftstore:Transport] [err=Transport(Disconnected)] [target=“id: 127411950 store_id: 258384”] [peer_id=376178658] [region_id=127411947] [type=MsgRegionWakeUp]
[2023/09/18 11:26:20.430 +08:00] [ERROR] [peer.rs:5243] [“failed to send extra message”] [err_code=KV:Raftstore:Transport] [err=Transport(Disconnected)] [target=“id: 107688236 store_id: 258384”] [peer_id=168415546] [region_id=107688233] [type=MsgRegionWakeUp]
[2023/09/18 11:26:20.430 +08:00] [ERROR] [peer.rs:5243] [“failed to send extra message”] [err_code=KV:Raftstore:Transport] [err=Transport(Disconnected)] [target=“id: 127406273 store_id: 258384”] [peer_id=171540724] [region_id=127406270] [type=MsgRegionWakeUp]
[2023/09/18 11:26:21.348 +08:00] [ERROR] [peer.rs:5243] [“failed to send extra message”] [err_code=KV:Raftstore:Transport] [err=Transport(Disconnected)] [target=“id: 107690395 store_id: 258384”] [peer_id=376130630] [region_id=107690392] [type=MsgRegionWakeUp]
[2023/09/18 11:26:21.358 +08:00] [ERROR] [peer.rs:5243] [“failed to send extra message”] [err_code=KV:Raftstore:Transport] [err=Transport(Disconnected)] [target=“id: 127505822 store_id: 258384”] [peer_id=171705796] [region_id=127505684] [type=MsgRegionWakeUp]
[2023/09/18 11:26:21.396 +08:00] [ERROR] [peer.rs:5243] [“failed to send extra message”] [err_code=KV:Raftstore:Transport] [err=Transport(Disconnected)] [target=“id: 127447839 store_id: 258384”] [peer_id=145503922] [region_id=127446955] [type=MsgRegionWakeUp]

Some TiFlash instances are fine. For example, the one with the 2023 suffix reports an error:


The error message is:

However, the same 2022 instance has no issues and can query results quickly.

| username: tidb菜鸟一只 | Original post link

Check if all TiFlash nodes are online using tiup cluster display tidb-xxx.

| username: 济南小老虎 | Original post link

All online. Some tables are queryable.

| username: 裤衩儿飞上天 | Original post link

This definitely lost the data of one node… Resynchronize the relevant tables with TiFlash.

| username: 数据小黑 | Original post link

Setting the TiFlash replica to 0 and then resetting it to the original number of replicas, is it possible?

| username: 济南小老虎 | Original post link

No, it’s still the same error. Plan to shrink all TiFlash and try again.

| username: 济南小老虎 | Original post link

A node is continuously reporting errors and cannot synchronize. There has been no change from 10 AM to now.

| username: 像风一样的男子 | Original post link

Testing environment or production environment? For the testing environment, just wipe everything and reinstall using the backup for recovery.

| username: tidb菜鸟一只 | Original post link

Then you directly specify using TiKV for the query, and check the data from 2022 to see if the source data in TiKV from 2022 is already abnormal, which is why TiFlash cannot query it…

| username: 济南小老虎 | Original post link

TiKV can be queried.

| username: tidb菜鸟一只 | Original post link

Then you directly set the table “2022” with ALTER TABLE SET TIFLASH REPLICA 0, and then change it to ALTER TABLE SET TIFLASH REPLICA 1, and then query again through TiFlash…