How to Choose Which Tables Need to Be Synchronized from TiKV to TiFlash?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何选择那些表需要从tikv同步到tiflash?

| username: TiDBer_5GvAkLi0

【TiDB Usage Environment】Production Environment
【TiDB Version】v5.4.1
【Encountered Problem】
I have 130 tables. Should I synchronize all 130 tables to TiFlash, or should I select certain tables to synchronize to TiFlash?


If the question is related to performance optimization or troubleshooting, please download the script and run it. Be sure to select all and copy-paste the terminal output results for upload.

| username: h5n1 | Original post link

According to the synchronization needs, TiFlash is a columnar storage. For tables where SQL queries can benefit from columnar storage to improve performance, set those tables accordingly.

| username: ShawnYan | Original post link

If it is really difficult to distinguish, you can first synchronize all tables in the test environment, then run for a period of time to check the execution plan, and then only keep the replicas of the tables that use TiFlash.

| username: TiDBer_5GvAkLi0 | Original post link

Okay, thank you for the explanation.

| username: TiDBer_5GvAkLi0 | Original post link

Thank you for the explanation.

| username: TiDBer_5GvAkLi0 | Original post link

May I ask if the number of tables synchronized to TiFlash will affect TiFlash’s performance? For example, what is the impact on TiFlash’s performance when synchronizing 50 tables from TiKV to TiFlash compared to synchronizing 100 tables from TiKV to TiFlash?

| username: forever | Original post link

You can try to find which statistical OLAP SQL queries require very few fields.

| username: h5n1 | Original post link

TiFlash, as a learner role, synchronizes data from TiKV. The more tables there are, the more resources will be consumed, especially in terms of network usage. The specific difference between having 50 and 100 tables needs to be measured in practice.

| username: BraveChen | Original post link

Tables that frequently require large-scale analytical queries

| username: ddhe9527 | Original post link

  1. Wide table (with many columns)
  2. Tables that often require full table scans
| username: TiDBer_5GvAkLi0 | Original post link

Got it! Thanks for the explanation.

| username: TiDBer_5GvAkLi0 | Original post link

Thank you for the explanation.

| username: alfred | Original post link

This indeed needs to be tested. It is indeed difficult to distinguish between OLAP and OLTP optimizers. Usually: 1. Some columns will be fully scanned 2. A small amount of data is returned to the client 3. The table has a large amount of data, which is suitable for placing in TiFlash.

| username: TiDBer_5GvAkLi0 | Original post link

Got it, thank you for the explanation.

| username: 特雷西-迈克-格雷迪 | Original post link

Large tables, such as tables with hundreds of millions of rows; but if you only need to query a few fields from them, you can set up TiFlash for these.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.