What is the logic behind Tiflash COMPACT?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tiflash COMPACT 背后的逻辑是什么?

| username: Running

Upgraded from 6.0 to 6.1, then executed ALTER TABLE net_online COMPACT tiflash replica; Subsequently, the performance soared, with queries on 900 million records improving from over ten seconds to results in 2 seconds. Could you explain the logic behind COMPACT?

| username: ddhe9527 | Original post link

TiFlash Major Compaction merges the data from the Delta layer with the data from the Stable layer on the disk, structurally similar to a 2-layer LSM-Tree. There is also the concept of Minor Compaction, where the DeltaTree continuously merges fragmented, small ColumnFileTiny files in the background into a larger ColumnFileTiny.

| username: Skyworker | Original post link

May I ask, besides improving speed, have you noticed any instability or side effects? Can similar effects be achieved on other tables as well?

| username: flow-PingCAP | Original post link

The principle is as follows:
TiFlash itself will automatically execute compact operations in the background. The compact operation optimizes the organization of data files and recycles expired data versions, thereby improving query performance. However, the automatic compact execution has certain trigger thresholds, such as the delta data volume reaching a certain size before it triggers; otherwise, it would cause significant write amplification. In other words, if no data is written to a table, even though its data organization is not optimal, TiFlash will not automatically compact it.

The manual compact command provided this time does the same thing as the background automatic compact, except it can be actively triggered via SQL. Therefore, it has no side effects; the only impact is that it will consume some system resources while running. Currently, the default parallelism is only 1.

Manual compact is usually used for tables that are not frequently updated, such as those imported once a day and then compacted. For frequently updated tables, manual compacting does not significantly affect performance (since TiFlash will do it automatically), but running it won’t hurt either.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.