TiKV Garbage Collection

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv gc

| username: Holland

【TiDB Usage Environment】Production, Testing, Research
【TiDB Version】
【Encountered Problem】
There are 200,000 writes and 200,000 updates daily, with a data volume of 1TB. The query QPS is 2000. If the KV GC is set to 24 hours and 30 days, how much additional storage is needed respectively? Assume each write is 1k.
【Reproduction Path】What operations were performed to encounter the problem
【Problem Phenomenon and Impact】

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

| username: xfworld | Original post link

You can try setting up an environment yourself. There will definitely be a difference between the physical storage size and the logical storage size.

| username: Holland | Original post link

How about giving an approximate estimate?

| username: TiDBer_jYQINSnf | Original post link

If you don’t use GC, updates are treated as inserts. The data volume is hard to estimate, and the underlying layer is RocksDB, where data will be compressed. Moreover, it’s not recommended to change the GC to 30 days. Assuming one KV is updated once a day, there will be 30 versions in 30 days. Each time you scan the data, it will be very laborious, checking one KV after another. How slow would that be?