Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb 是不是没有办法比较准确的求出数据压缩前的数据大小
Dear experts, may I ask, it seems that TiDB cannot accurately determine the size of the data before compression, right? The value obtained using information_schema.tables should be the size of the data before compression, but this is also an estimated value. Just like MySQL, it is actually not possible to determine the accurate data size.
You can set the default compression ratio to 3.5, and then back-calculate, but this is just an approximation.
In the PD panel under Statistics - balance, you can see the compression ratio.
This should not be the compression ratio. I see that my cluster has 9 here, and it’s impossible for the compression ratio to be 9 times.
I have seen a 9x difference. There are two more charts in this panel, one is Store used and the other is Store Region size. Try dividing these two data points and see.
The store region size should be the sum of the estimated logical sizes of all regions on one TiKV store, which is likely to have some errors.
There is definitely an error. According to the monitoring, the size before compression is the Store Region size, and the size after compression is the Store used.
So, may I ask, if we query the size of a table or database through information_schema.tables, is this size for 1 TiKV replica or 3 TiKV replicas?
This type of query requires very accurate statistical information, and it looks like it’s for 1 replica.