Performance Issues of Scan Operations After Deleting a Large Number of Keys in TiKV

translator_bot · June 23, 2024, 11:12am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiKV 在删除大量 key 后做 scan 操作性能的问题

| username: Wine93

Hi, everyone~ If we delete a large number of keys under the same prefix in TiKV and then perform a prefix search, will the performance significantly decrease?
My understanding is that TiKV’s storage layer uses RocksDB, and RocksDB’s deletion only inserts a Delete record without actually deleting the key. This means that even if all keys under a prefix are deleted, searching with that prefix will still traverse all the deleted keys under that prefix. For more details, see RocksDB issue #5265.
If TiKV does not have this issue, what optimizations have been made? Since I am not very familiar with TiKV’s implementation, I hope you can provide some insights or suggestions. Thank you very much~

translator_bot · June 23, 2024, 11:12am

| username: xfworld | Original post link

There are optimizations, but they are limited. Refer to the case study, it should basically answer your questions.

However, one thing to note is that there is still a significant difference between the architecture of raw TiKV and TiDB in handling data.

TiDB 的问答社区 – 6 Jun 22

执行delete后查询慢

🪐 TiDB 技术问题 TiFlash

想了解删删数据怎么同步到tiflash? TiFlash 以 Raft Learner 协议从 tikv 接数，从原理上看，本身内部分为 Stable Layer 和 Delta Layer ，删除数据会增加 Delta Layer 的数据量，进而增加整体 Stable Layer + Delta Layer 数据量。还有就是控制变量法，观察删除数据前走 tiflash 的执行计划是否一样，耗时是否一样，实际细节还需借助 TiFlash 面板一步步实锤，每一步慢的细节。from →...

translator_bot · June 23, 2024, 11:12am

| username: Wine93 | Original post link

Thank you very much for your answer~ I still have some questions. Specifically regarding the RocksDB layer, besides manually triggering CompactRange, are there any other optimizations? (I couldn’t find anything in the related documentation)
Especially for deleted keys that are still in memtables and haven’t been flushed to SST yet, this will also increase the scan duration. Is there any optimization done at the upper layer for this aspect?

translator_bot · June 23, 2024, 11:12am

| username: xfworld | Original post link

I remember that during a full table scan before, it wouldn’t skip these deleted keys. Later, there was an adjustment in the seek keys strategy, and I think there was an issue. Finally, a version fix was made, but I forgot from which version the fix was applied.

Also, I still recommend considering the methods provided in the best practices for deleting data to avoid pitfalls.

translator_bot · June 23, 2024, 11:12am

| username: tidb狂热爱好者 | Original post link

You set the GC time to 10 minutes.

translator_bot · June 23, 2024, 11:12am

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.