A large number of small SST files of KB size under a single region

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 一个region下有大量kb大小的sst小文件

| username: h5n1

[Version] v5.2.3 arm
[Issue]
Using tikv-ctl region-properties to view the region properties of a certain table, it shows that the region has 456 SST files, all of which are small files of a few KB, with the largest being just over 100 KB. The DDL history of this table has been checked and no truncate operations have been performed. What could cause the generation of many small files? How can small files be merged?


| username: Billmay表妹 | Original post link

Should we put it in the moderator meeting?

| username: jansu-dev | Original post link

  1. This uses the GetPropertiesOfTablesInRange method of RocksDB to dump the table information for a given range.

  2. When this range corresponds to ctl, the range of the region will be passed in, and the corresponding CF information of RocksDB will be output.

  3. What causes the generation of many small files?
    It is probably related to the rules of RocksDB itself.

  4. How to merge small files?
    I think compaction should reduce them to some extent.

However, points 3 and 4 do not really answer your question. You still need to consult with a knowledgeable RocksDB expert in the moderator exchange meeting.

BTW: Higher versions will display CFName.

| username: Aunt-Shirly | Original post link

Hello, there is a scenario where this situation can occur as follows:

  1. There is a very large table, let’s assume it spans 10 regions.
  2. A large amount of data starts to be deleted gradually through the delete method.
  3. PD notices that region 1 and region 2 are very small and initiates a merge of region 1 and region 2.
  4. During the merge process, the replicas of region 1 and region 2 need to be migrated together. During the migration, small SST files are generated because SST files are directly written to the target node.
  5. Merge of region 1 and region 2 is successful. At this point, you can see small SST files appearing in the mvcc properties of the merged region.
  6. Similarly, if region 3 is found to be very small, the merge process can start again, repeating steps 3-5.
  7. Finally, these 10 regions are merged into one region. When looking at the mvcc properties of the final region, you can see many of these small SST files.

If the above situation occurs, you can manually compact these small SST files by specifying the range. Relevant documentation: TiKV Control 使用说明 | PingCAP 文档中心

Additionally, it is necessary to confirm from the monitoring and PD logs whether this region was generated from continuous merges.

| username: h5n1 | Original post link

In this situation, should there be a mechanism to delete the original SST after merging?

| username: Aunt-Shirly | Original post link

Yes, after the compaction is completed, the old SST files will be deleted.

| username: h5n1 | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.