Why is the performance of ordinary aggregate queries using TiFlash very slow and unable to produce results?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 使用tiflash普通聚合查询性能很慢,一直查不出来是什么情况?

| username: TiDBer_ZHcgATCp

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version] V7.1.1
[Reproduction Path] Ordinary sum(), count() aggregate queries always fail to retrieve data
[Encountered Problem: Phenomenon and Impact] Ordinary sum(), count() aggregate queries always fail to retrieve data. Could it be affected by TiKV?
[Resource Configuration] 12 TiFlash nodes, memory usage is only around 20%
[Attachments: Screenshots / Logs / Monitoring]
[2023/12/12 17:52:27.580 +09:00] [WARN] [CoprocessorHandler.cpp:152] [“RegionException: region 834824106, message: Region error NOT_FOUND: (while creating InputStreams from storage db_2723.t_2816133, table_id: 2816133)”] [source=CoprocessorHandler] [thread_id=1401]
[2023/12/12 17:55:14.300 +09:00] [WARN] [MPPTaskManager.cpp:155] [“Begin to abort query: <query_ts:1702371013846593685, local_query_id:51, server_id:2107318, start_ts:446266347052335112>, abort type: ONCANCELLATION, reason: Receive cancel request from TiDB”] [thread_id=1880]
[2023/12/12 17:55:14.300 +09:00] [WARN] [MPPTaskManager.cpp:165] [“<query_ts:1702371013846593685, local_query_id:51, server_id:2107318, start_ts:446266347052335112> does not found in task manager, skip abort”] [thread_id=1880]
[2023/12/12 17:55:53.685 +09:00] [WARN] [MPPTaskManager.cpp:155] [“Begin to abort query: <query_ts:1702371052955116238, local_query_id:2787, server_id:1097863, start_ts:446266357301903373>, abort type: ONCANCELLATION, reason: Receive cancel request from TiDB”] [thread_id=1872]
[2023/12/12 17:55:53.685 +09:00] [WARN] [MPPTaskManager.cpp:165] [“<query_ts:1702371052955116238, local_query_id:2787, server_id:1097863, start_ts:446266357301903373> does not found in task manager, skip abort”] [thread_id=1872]
[2023/12/12 18:22:08.710 +09:00] [WARN] [] [“region {843088317,28757,517} find error: peer is not leader for region 843088317, leader may Some(id: 846903417 store_id: 7)”] [source=pingcap.tikv] [thread_id=1337]
[2023/12/12 18:22:08.714 +09:00] [WARN] [] [“region {380717924,16445,496} find error: region 380717924 is missing”] [source=pingcap.tikv] [thread_id=1337]
[2023/12/12 18:27:32.047 +09:00] [WARN] [] [“region {843088317,28757,517} find error: region 843088317 is missing”] [source=pingcap.tikv] [thread_id=1335]
[2023/12/12 18:33:41.992 +09:00] [WARN] [] [“region {843088317,28763,517} find error: peer is not leader for region 843088317, leader may Some(id: 846513101 store_id: 4)”] [source=pingcap.tikv] [thread_id=13]
[2023/12/12 18:47:56.956 +09:00] [WARN] [] [“region {834778343,7583,4376} find error: region 834778343 is missing”] [source=pingcap.tikv] [thread_id=1338]
[2023/12/12 18:48:20.521 +09:00] [WARN] [MPPTaskManager.cpp:155] [“Begin to abort query: <query_ts:1702374200389790720, local_query_id:52, server_id:2107318, start_ts:446267182374191109>, abort type: ONCANCELLATION, reason: Receive cancel request from TiDB”] [thread_id=1869]
[2023/12/12 18:48:20.522 +09:00] [WARN] [MPPTaskManager.cpp:165] [“<query_ts:1702374200389790720, local_query_id:52, server_id:2107318, start_ts:446267182374191109> does not found in task manager, skip abort”] [thread_id=1869]
[2023/12/12 18:49:57.834 +09:00] [WARN] [] [“region {843088317,28763,517} find error: peer is not leader for region 843088317, leader may Some(id: 848942155 store_id: 95948980)”] [source=pingcap.tikv] [thread_id=11]
[2023/12/12 19:04:42.311 +09:00] [WARN] [] [“region {843088317,28763,517} find error: region 843088317 is missing”] [source=pingcap.tikv] [thread_id=1335]

| username: TiDBer_ZHcgATCp | Original post link

Are there any experts who know what factors might affect TiFlash?

| username: 随缘天空 | Original post link

It seems that the region your query is trying to retrieve cannot be found. Check if the storage files on your server are damaged or missing. Also, verify whether your table still exists or if there are any errors in your query.

| username: Kongdom | Original post link

Please provide the result of EXPLAIN ANALYZE to confirm whether the aggregate query is using TiFlash.

| username: dba远航 | Original post link

Region 834778343 is missing, it seems that the query failed due to data anomalies.

| username: andone | Original post link

Find the actual SQL statement and take a look at the explain.

| username: tidb菜鸟一只 | Original post link

I suggest setting the TiFlash replica of the corresponding table to 0 and then setting it back to 1 to see if it helps.

| username: TiDBer_ZHcgATCp | Original post link

Because I took down a few TiKV nodes the day before yesterday, I couldn’t retrieve the regions. After restarting TiFlash, it was resolved.

| username: TiDBer_ZHcgATCp | Original post link

Because I installed a few TiKV nodes the day before yesterday, I couldn’t retrieve the regions. After restarting TiFlash, it was resolved.

| username: TiDBer_ZHcgATCp | Original post link

Because a few TiKV nodes were taken down the day before yesterday, regions couldn’t be retrieved. It’s likely that the regions being queried were on those TiKV nodes that were taken down. After restarting TiFlash, everything was fine.

| username: TiDBer_ZHcgATCp | Original post link

I tried, but it didn’t work. It only worked after restarting TiFlash. Does restarting TiFlash resynchronize TiKV data?

| username: 随缘天空 | Original post link

Well, after the restart, the data might have resynchronized to TiFlash.

| username: h5n1 | Original post link

Use pd-ctl region to check the status of those regions.

| username: 有猫万事足 | Original post link

Is there an execution plan?

Receive cancel request from TiDB

It feels like the query was actively canceled after failing to retrieve results.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.