TiFlash query results are sometimes fast and sometimes slow, and the execution plan shows that rpc_time is too long. How to optimize and troubleshoot?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash查询结果时快时慢,看执行计划是rpc_time 过长, 如何优化排查

| username: foxchan

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Problem Phenomenon and Impact]
The same SQL running on TiFlash sometimes has long rpc_time and sometimes short. How to optimize and troubleshoot?
When it’s normal, it completes in a dozen seconds, but when it’s abnormal, it takes at least a few minutes.

| username: Billmay表妹 | Original post link

Refer to these~

| username: heming | Original post link

TiDB version 6.1.2
Additionally, I’m not sure if this alert message is related.

| username: heming | Original post link

Sometimes, a large number of SQL executions suddenly become slow, even very simple SQLs are slow. At 13:50, when checking TiDB, TiKV, and TiFlash, the CPU usage of each component did not show any significant increase.

| username: weixiaobing | Original post link

I feel that there’s no need to use TiFlash for this; creating appropriate indexes should be sufficient.

| username: heming | Original post link

SQL commands are generally very fast, but occasionally they can get stuck. The key issue is not whether TiFlash is being used.

| username: tidb菜鸟一只 | Original post link

Is it normal or not? Are the execution plans the same?

| username: foxchan | Original post link

The same, all use TiFlash.

| username: Billmay表妹 | Original post link

How do you determine if your SQL will use TiFlash instead of the TiKV index?

| username: tidb菜鸟一只 | Original post link

Check if there are any issues with the grpc monitoring under tikv-details at the time when your SQL is slow…

| username: foxchan | Original post link

There is no problem at 13:50.


Secondly, this is looking at the monitoring of TiKV. I have always been talking about TiFlash.

| username: dba-kit | Original post link

Is the slow SQL a simple single-table query? If it is a simple query that slows down during peak periods, you can consider adjusting the tidb_allow_batch_cop parameter to 2 (although I haven’t actually tested it, you will need to verify it yourself…).

If everything uniformly becomes very slow, you can check the tidb_max_tiflash_threads parameter to see what the value is.

I guess it might be due to queuing at a certain stage.

| username: 会飞的土拨鼠 | Original post link

You can check the slow SQL execution records during that period and then analyze them. It is possible that the database memory and CPU load were very high at that time.

| username: tidb菜鸟一只 | Original post link

Do the peak fluctuations of grpc message duration coincide with the slow SQL times?

| username: foxchan | Original post link

There is a scheduled task that frequently creates tables and imports data, which might be the cause. TiFlash will synchronize schemas without replicas. This might be the reason. This table is periodically deleted and recreated, but it is not placed in TiFlash, resulting in TiFlash frequently reporting this exception.

| username: heming | Original post link

Previously, there was no such situation. It should be after upgrading to 6.1.2. Let’s see if the relevant technology can troubleshoot the issue based on this alert log. Why does periodically rebuilding the table cause TiFlash to be unstable?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.