Some Questions About TiDB

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于TIDB一些疑问请教

| username: TiDBer_JlY1JCJ5

Recently, I’ve been learning about TiDB, but I still have some questions that need clarification.

  1. TiFlash can quickly synchronize row-stored data from TiDB. I read online that the reason for the speed is that it can synchronize by reading the idx value of the raft log. However, I also saw some people saying it’s because TiFlash uses DeltaTree. Are these two reasons the same, and is there any connection between them?
  2. Does TiFlash column storage have indexes? How are the indexes created? Is there any documentation on this? I haven’t been able to find any.
  3. Does TiFlash have data sharding? I only saw that TiKV has data distribution.
| username: Billmay表妹 | Original post link

Thank you very much for your attention to TiDB! Regarding your questions, I will answer them one by one as follows:

  1. There are indeed two reasons for the rapid synchronization of TiDB row storage data by TiFlash. First, TiFlash uses the Raft protocol to achieve data synchronization by reading the idx value of the Raft log for synchronization[1]. Second, TiFlash also uses DeltaTree technology, which is an incremental update data structure that can quickly apply changes to columnar data, thereby achieving rapid synchronization[1]. These two reasons are interrelated and are both aimed at enabling TiFlash to quickly synchronize TiDB row storage data.

  2. The TiFlash columnar engine supports indexing. In TiFlash, indexes are implemented by constructing special data structures on columnar data. Specifically, TiFlash uses two types of indexes: Bitmap Index and Inverted Index[1]. Bitmap Index is suitable for low cardinality columns and can quickly locate rows that meet specific conditions; Inverted Index is suitable for high cardinality columns and can quickly locate rows containing specific values. Through these indexes, TiFlash can provide more efficient query performance.

  3. Currently, TiFlash does not have the concept of data sharding. Data sharding is a feature of TiKV, used to distribute data storage across different nodes to achieve horizontal scaling and load balancing. TiFlash, on the other hand, is an independent component of TiDB used to accelerate queries. It does not directly store data but obtains data through synchronization with TiKV[2]. Therefore, the concept of data sharding mainly applies to TiKV and not to TiFlash.

| username: 像风一样的男子 | Original post link

TiFlash does not have indexes; it uses columnar storage, meaning each column acts as an index.
For related questions, you can refer to the following articles:

https://zhuanlan.zhihu.com/p/127823207

| username: zhanggame1 | Original post link

TiFlash has a data replica count. You can start multiple TiFlash instances and then configure the number of replicas for the tables that need to use TiFlash. For example, to configure 2 replicas, you can use the following command:

ALTER TABLE test_tb SET TIFLASH REPLICA 2;
| username: 江湖故人 | Original post link

I couldn’t find any description related to TiFlash in the official documentation create index. As far as I know, columnar databases can have indexes, and when the WHERE condition can use the index, it can avoid reading the entire column of data.

| username: zhanggame1 | Original post link

Columnar storage inherently has an order, so using “where” can certainly quickly locate the data.

| username: TiDBer_JlY1JCJ5 | Original post link

Thank you for your explanation, but it seems that the article you provided does not mention Bitmap Index and Inverted Index. Could you provide a source for this information? Also, I have seen some people say that since it is column storage, each column is an index. How should this be understood?

| username: TiDBer_JlY1JCJ5 | Original post link

Why is each column in a column store an index? I don’t quite understand this statement.

| username: 小龙虾爱大龙虾 | Original post link

Isn’t an index just storing the data of a certain column together? Isn’t column storage also storing the data of a certain column together?

| username: 江湖故人 | Original post link

TiKV and TiFlash both use primary key sorting. Value sorting is required to locate the WHERE range.

| username: 江湖故人 | Original post link

Row storage or column storage can be a heap of disorganized data (ordinary heap table), while an orderly stored table is called an Index Organized Table (IOT), and the index is an ordered doubly linked list. The description might not be precise enough, but I hope it helps you understand :grimacing:

Some competitors already support column store table indexes. Since the cousin mentioned support, it is probably still in internal testing, and the official documentation hasn’t been updated.

Index types supported by GaussDB column store tables: Psort (default for column store tables), btree, gin. [1]
OpenGauss Column Store Table PSort Index

| username: TiDBer_JlY1JCJ5 | Original post link

According to the current documentation, column storage does not have indexes yet, right?

| username: zhanggame1 | Original post link

Take a look at this, there are 10 chapters, and the first chapter talks about indexes.
TiFlash Source Code Reading (1) TiFlash Storage Layer Overview | PingCAP

| username: 江湖故人 | Original post link

The reference guide does not contain information on manually creating columnar table indexes. The Rough Set Index seems more like a kind of metadata that comes with TiFlash, where each chunk of data carries statistical information that can serve as a filter, similar to the Knowledge Node in Infobright.

| username: Billmay表妹 | Original post link

This content is very suitable for you:

First Issue: TiFlash Storage Engine Design Concept

Author: Huang Junshen

Summary: This will introduce the overall form of TiDB HTAP and provide a detailed analysis of the design concept and sub-modules of the storage layer DeltaTree engine optimization.

Meeting Materials: TiFlash Storage Layer Overview.pdf (877.2 KB)

Video Replay: TiFlash Storage Engine Design Concept_Bilibili

Full Review: Column - TiFlash Source Code Reading (1) TiFlash Storage Layer Overview | TiDB Community

Second Issue: TiFlash Computing Layer Overview

Author: Xu Fei

Summary: This issue provides an overview of the design principles and code implementation of the TiFlash computing layer.

Meeting Materials: TiFlash Computing Layer Overview - Xu Fei.pdf (1.1 MB)

Video Replay: Source Code Interpretation - TiFlash Computing Layer Overview_Bilibili

Full Review: Column - TiFlash Source Code Reading (2) Computing Layer Overview | TiDB Community

Third Issue: TiFlash DeltaTree Engine Design and Implementation Analysis Part 1

Author: Shi Wenxuan

Summary: This issue provides an in-depth understanding of the principles and workflows related to the write path of the TiFlash storage layer DeltaTree engine.

Meeting Materials: TiFlash DeltaTree Storage Engine (Part 1).pdf (2.2 MB)

Video Replay: TiFlash DeltaTree Engine Design and Implementation Analysis_Bilibili

Full Review: Column - TiFlash Source Code Reading (3) DeltaTree Storage Engine Design and Implementation Analysis - Part 1 | TiDB Community

Fourth Issue: TiFlash DeltaTree Engine Design and Implementation Analysis Part 2

Author: Shi Wenxuan

Summary: This issue provides an in-depth understanding of the read and write workflows and code implementation of the TiFlash storage layer DeltaTree engine.

Meeting Materials: TiFlash DeltaTree Storage Engine (Part 2).pdf (1.2 MB)

Video Replay: Source Code Interpretation | TiFlash Storage Layer DeltaTree Engine (Read Path)_Bilibili

Full Review: Column - TiFlash Source Code Reading (5) DeltaTree Storage Engine Design and Implementation Analysis - Part 2 | TiDB Community

Fifth Issue: TiFlash DDL Module Design and Implementation Analysis

Author: Hong Yunyan

Summary: This issue provides an understanding of the design philosophy and code implementation of the TiFlash DDL module.

Meeting Materials: TiFlash Source Code Interpretation - DDL Module(2).pdf (1.5 MB)

Video Replay: Source Code Interpretation | TiFlash DDL Module Design and Implementation Analysis_Bilibili

Full Review: Column - TiFlash Source Code Interpretation (4) | TiFlash DDL Module Design and Implementation Analysis | TiDB Community

Sixth Issue: TiFlash Common Operator Design and Implementation

Author: Qi Zhi

Summary: This issue provides an understanding of the various stages of TiFlash operators, enabling you to understand the design logic of the operator code and further independently read the code or handle simple issues.

Meeting Materials: TiFlash Common Operator Design and Implementation.pdf (2.9 MB)

Video Replay: Source Code Interpretation | TiFlash Common Operator Design and Implementation_Bilibili

Seventh Issue: TiFlash DeltaTree Index Design and Implementation

Author: Li Dezhu

Summary: This issue provides an understanding of the role and implementation principles of the core data structure DeltaTree Index in the TiFlash storage layer.

Meeting Materials: TiFlash DeltaTree Index Design and Implementation Analysis.pdf (1.2 MB)

Video Replay: TiFlash DeltaTree Index_Bilibili

Full Review: Column - TiFlash DeltaTree Index Design and Implementation Analysis | TiDB Community

Eighth Issue: TiFlash Proxy Module Introduction

Author: Luo Rongzhen

Summary: This issue helps you understand the principles of the TiFlash Proxy module, how it helps TiFlash obtain data, how it interacts with TiFlash, and the adjustments and optimizations made for TiFlash’s write mode compared to TiKV.

Meeting Materials: TiFlash Source Code Interpretation - Proxy Module.pdf (1.6 MB)

Video Replay: TiFlash Proxy Module Introduction_Bilibili

Full Review: Column - TiFlash Proxy Module Introduction | TiDB Community

Ninth Issue: TiFlash Expression Design and Implementation

Author: Huang Haisheng

Summary: This issue provides an understanding of the design and source code implementation of TiFlash expressions, helping you contribute to TiFlash in the future.

Meeting Materials: TiFlash Expression Design and Implementation.pdf (1.8 MB)

Video Replay: TiFlash Expression Design and Implementation_Bilibili

Full Review: Column - TiFlash Expression Design and Implementation | TiDB Community

| username: TiDBer_JlY1JCJ5 | Original post link

Thank you for your explanation. I also saw that there is a DeltaTree index data structure in DeltaTree. Since this DeltaTree index is a data structure, isn’t it an index for TiFlash?

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.