Why doesn't TiFlash Minor Compaction sort before merging?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiFlash Minor Compaction 为什么不排序后再合并?

| username: TiAmo

When flushing data from MemTableSet to disk, the data in the Block needs to be sorted. However, during the subsequent Minor Compaction, several small ColumnFileTiny files are merged into a larger ColumnFileTiny file without sorting (just concatenation). I would like to ask why this is necessary? Additionally, the subsequent Minor Compaction results in the flushed data being unordered, so why does the flushed data need to be sorted in the first place?

| username: flow-PingCAP | Original post link

In fact, Minor Compaction can be sorted, but after sorting, the DeltaIndex needs to be organized like a Flush, and some corner cases need to be handled. The developer was lazy…

| username: TiAmo | Original post link

So, if reading data from the Delta Index, is it reading the first layer (i.e., the data of the smaller ColumnFileTiny)? Then what is the purpose of the newly synthesized ColumnFileTiny (the larger one)?

| username: flow-PingCAP | Original post link

After merging, the large ColumnFileTiny should replace the original smaller ones. The main purpose of merging large files is to reduce the number of IO operations. Of course, if sorting is done during the merging process, it can also help with data locality, but the benefit doesn’t seem to be very significant. A long time ago, I hacked a version with a bug and found no obvious benefit. The bug was also difficult to trace, so I gave up Orz.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.