Some Issues with Tiflash

translator_bot · June 21, 2024, 11:44am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tiflash一些问题

| username: TiDBer_JlY1JCJ5

Does TiFlash use the Raft protocol to achieve data synchronization by reading the idx value of the Raft log for synchronization? Is there a specific document source for this?
Is DeltaTree index the index of TiFlash? Why is an index generally not needed for column storage?

translator_bot · June 21, 2024, 11:44am

| username: dba远航 | Original post link

You can search for the documentation on Baidu. The reason why columnar storage does not require indexes is that the purpose of querying columnar storage is for columnar statistical analysis, and it often involves querying large amounts of data, which usually does not require index filtering.

translator_bot · June 21, 2024, 11:44am

| username: xfworld | Original post link

There is no documentation, but there is source code that you can directly look at…

The official blog has some posts you can refer to:

The biggest difference between columnar storage and row storage is that they use different data retrieval models. Row storage uses the volcano model, while columnar storage uses the vector engine.

translator_bot · June 21, 2024, 11:44am

| username: TiDBer_小阿飞 | Original post link

The link you provided leads to a specific article on Zhihu, which I cannot access directly. Please provide the text you need translated, and I will translate it for you.

translator_bot · June 21, 2024, 11:44am

| username: tidb菜鸟一只 | Original post link

I casually drew something to help with understanding.

For row-based table data retrieval, if it’s generally through rowid, it directly determines the data of the corresponding row and then returns the required columns. If it’s through an index, it first determines the rowid of the row through the index field, then goes back to the table to determine the data of the corresponding row, and then returns the required columns (if only the index column data is needed, there’s no need to go back to the table).

For column-based data retrieval, it’s generally for summarizing one or several columns, directly querying the summary data of the corresponding columns. If querying a single value, it retrieves the corresponding value data of the corresponding column based on the rowkey.

You can actually think of column storage as a row-based table with multiple indexes where each field has an index… So each field in column storage is equivalent to an index, and there’s no need to create separate indexes.

translator_bot · June 21, 2024, 11:44am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.