TiDB Index Scan

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb索引扫描

| username: lijin

The official course says that in intelligent selection, index scan for data is suitable for TiKV, but Baidu AI says that in the new version of TiDB, the efficiency of TiKV and TiFlash in handling index queries is similar. Who is right and why?

| username: Jasper | Original post link

To be precise, point queries with smaller data volumes or index range scans are more suitable for TiKV. TiFlash does not have the concept of indexes and can only perform full table scans.

| username: tidb菜鸟一只 | Original post link

Don’t trust Baidu AI. If it’s an index scan of a small amount of data, TiKV is indeed more suitable. If it’s a summary SQL that scans a large amount of data, TiFlash is more appropriate.

| username: 小龙虾爱大龙虾 | Original post link

AI will seriously talk nonsense :joy_cat:

| username: forever | Original post link

For multi-column computational AP queries, TFlash has an advantage, while for point queries and other TP tasks, TiKV has an advantage.

| username: dba远航 | Original post link

This is correct.

| username: 哈喽沃德 | Original post link

You actually believe in Baidu? It can only be used as an auxiliary tool.

| username: Jellybean | Original post link

Baidu AI is talking nonsense seriously, just take a look.

To understand the phrase “Index scan for data is suitable for TiKV,” let me give you an example. Suppose there is a table with 100 million rows, and each user ID (indexed) has a value called money. When performing the following calculations:

  1. Point queries or index filtering within a small range should use TiKV.
    select sum(money) from t where id = 1;
    select sum(money) from t where id > 1 and id < 10;

  2. Index filtering for more than half of the table or full table scans should use TiFlash.
    select sum(money) from t where id > 1 and id < 90000000;
    select sum(money) from t;

The intelligent selection process roughly follows this logic, where the optimizer will execute optimizations based on statistical information.

| username: wangccsy | Original post link

No matter what DB it is, index design is really crucial.

| username: zhanggame1 | Original post link

Don’t take Baidu AI stuff seriously.

| username: FutureDB | Original post link

Correct, generally speaking, if more than half of the index is filtered or a full table scan is performed, and if the table has a TiFlash replica, the optimizer is likely to choose a full table scan on TiFlash rather than index filtering on TiKV or a full table scan on TiKV.