Performance Issues of Scan Operations After Deleting a Large Number of Keys in TiKV

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiKV 在删除大量 key 后做 scan 操作性能的问题

| username: Wine93

Hi, everyone~ If we delete a large number of keys under the same prefix in TiKV and then perform a prefix search, will the performance significantly decrease?
My understanding is that TiKV’s storage layer uses RocksDB, and RocksDB’s deletion only inserts a Delete record without actually deleting the key. This means that even if all keys under a prefix are deleted, searching with that prefix will still traverse all the deleted keys under that prefix. For more details, see RocksDB issue #5265.
If TiKV does not have this issue, what optimizations have been made? Since I am not very familiar with TiKV’s implementation, I hope you can provide some insights or suggestions. Thank you very much~

| username: xfworld | Original post link

There are optimizations, but they are limited. Refer to the case study, it should basically answer your questions.

However, one thing to note is that there is still a significant difference between the architecture of raw TiKV and TiDB in handling data.

| username: Wine93 | Original post link

Thank you very much for your answer~ I still have some questions. Specifically regarding the RocksDB layer, besides manually triggering CompactRange, are there any other optimizations? (I couldn’t find anything in the related documentation)
Especially for deleted keys that are still in memtables and haven’t been flushed to SST yet, this will also increase the scan duration. Is there any optimization done at the upper layer for this aspect?

| username: xfworld | Original post link

I remember that during a full table scan before, it wouldn’t skip these deleted keys. Later, there was an adjustment in the seek keys strategy, and I think there was an issue. Finally, a version fix was made, but I forgot from which version the fix was applied. :joy:

Also, I still recommend considering the methods provided in the best practices for deleting data to avoid pitfalls.

| username: tidb狂热爱好者 | Original post link

You set the GC time to 10 minutes.

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.