TiFlash is not used in complex queries

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tiflash在复杂查询中没有使用到

| username: Jjjjayson_zeng

[TiDB Usage Environment] Production Environment
[TiDB Version] v6.5.1
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Phenomenon and Impact] TiFlash is not used in complex queries
[Resource Configuration] 1 TiFlash, 3 PD, 5 TiKV
[Attachments: Screenshots/Logs/Monitoring]


TiFlash can be used when querying a single table

Not used in complex statements

| username: Kongdom | Original post link

The analyzer might consider the TiKV index to be more optimal.

You can use /*+ read_from_storage(tiflash[h1]) */ to compare the efficiency of the two.

| username: xfworld | Original post link

This is automatically optimized based on the execution plan.

If you want to force the use of TiFlash, you can use a hint.

| username: Jjjjayson_zeng | Original post link

Replace h1 with the table name?

| username: Jjjjayson_zeng | Original post link

Does this table get used many times, and does it add up the time from all the places where it is used?

| username: Kongdom | Original post link

Yes, there is a table alias, just a table alias.

| username: Jjjjayson_zeng | Original post link

What is this parser defined by? Do I need to replace all instances where this table is used with this one?

| username: tidb菜鸟一只 | Original post link

First, try using an SQL hint to force it to use TiFlash and see if the speed improves, or collect the statistics for the h1 table and check.

| username: Jjjjayson_zeng | Original post link

Now, even if I write a simple SQL, it will use TiFlash, but our complex SQL with over 1000 lines will use TiKV.

| username: Jjjjayson_zeng | Original post link

So it’s difficult for me to verify, which is why I’m asking if everything needs to be replaced.

| username: Kongdom | Original post link

Coincidentally, I am also comparing the performance of TiKV and TiFlash. I found that letting the optimizer choose by itself yields the highest performance. For some small range data scans, TiKV is more efficient than TiFlash. I think this might be because TiKV can use indexes to quickly retrieve this part of the data, while TiFlash requires a full table scan.

| username: Kongdom | Original post link

I think so, I haven’t verified it, you can check it.

| username: Jjjjayson_zeng | Original post link

I just don’t know how to verify it now, and our SQL often has more than 1000 lines, so it’s quite complicated to verify.

| username: Kongdom | Original post link

It’s better to let the TiDB optimizer choose autonomously for this kind of situation. It depends on the usage scenario, and it’s not necessarily true that TiFlash is always faster than TiKV.

| username: Jjjjayson_zeng | Original post link

Do you have any documentation on optimizing TiDB? I want to learn.
I’m worried every day.

| username: Jjjjayson_zeng | Original post link

Another question, for example, if we have a view in our SQL, does adding TiFlash to the tables in the view help with performance?

| username: dba-kit | Original post link

It might be related to the number of fields in the query. TiFlash uses columnar storage, which is actually suitable for aggregating a small number of columns. Each additional column read increases the cost of using TiFlash. In other words, the more fields in the query, the more likely the optimizer will prefer TiKV. I previously encountered a SQL query that couldn’t use TiFlash no matter what. After asking the business to reduce the number of query fields, it worked.

| username: dba-kit | Original post link

Here are some official documents you can check out:

| username: Kongdom | Original post link

What the person above posted is fine, the rest is basically just based on intuition.

| username: Running | Original post link

Set up a dedicated TiDB server node with high rigidity specifically for complex queries. I encountered a similar issue before and couldn’t resolve it.