Does the SQL optimization strategy for TiKV apply to TiFlash?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv的sql优化策略,适用于tiflash吗?

| username: Running

Does the SQL optimization strategy for TiKV apply to TiFlash? For example, multi-table join fields should have indexes, rewriting OR to IN, etc. Or how should SQL queries for the TiFlash engine be optimized? (excluding hardware-level expansion and optimization)

| username: xfworld | Original post link

TiFlash is currently used for acceleration, and the optimization methods are limited.

Refer to this article:

| username: TiDBer_UUTlqVvZ | Original post link

To optimize SQL queries for the TiFlash engine, consider the following aspects:

  1. Use TiFlash-supported functions: Not all functions are supported for pushdown in TiFlash. Refer to the official support list and standardize SQL according to the pushdown list.

  2. Bind SQL plans or force index usage: If the query results are inaccurate, you can bind SQL plans or force index usage. Use the following statement to bind:

CREATE [GLOBAL | SESSION] BINDING FOR BindableStmt USING BindableStmt;
  1. Manually re-statistic table information: If automatic analyze time is bypassed, manually re-statistic table information. Use the following statements for manual statistics:
show stats_meta where table_name='xxx';
show stats_healthy where table_name='xxx';
show STATS_HISTOGRAMS where table_name='xxx';
analyze table xxx;
  1. Avoid hotspot issues: Use clustered tables and where filters to retain as few data records as possible, find the primary key first, then join, and avoid full table scans. For uncertain field combinations, consider using TiFlash to accelerate uncertain condition filtering.

  2. Adjust TiFlash concurrency: Increase tidb_distsql_scan_concurrency to increase concurrency. TiFlash nodes should be deployed independently, with replicas preferably greater than 1 and less than the number of TiKV.

Additional reading:

| username: yulei7633 | Original post link

I will also refer to it and learn.

| username: Running | Original post link

Does column storage still have anything to do with indexes? It seems that the execution plans for the TiFlash engine are all full table scans.

| username: Running | Original post link

Looking forward to higher performance improvements in TiFlash and a reduction in the complexity of the technology stack.

| username: TiDBer_UUTlqVvZ | Original post link

This is because not all SQL queries are suitable for TiFlash, so it may be necessary to make a judgment here. For many queries, using the index to access TiKV is actually much faster than using TiFlash.

| username: Running | Original post link

Got it, I have now set up a tidbserver node (with only the tiflash engine) specifically to handle complex queries. It seems to have both advantages and disadvantages.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.