Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 如何测试tidb的数据压缩比

How to test the data compression ratio of TiDB
Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 如何测试tidb的数据压缩比
How to test the data compression ratio of TiDB
Refer to the official documentation, or write the same data into MySQL and TiDB respectively, and check the disk usage.
In reality, because TiDB uses three replicas, although it can be compressed, the final physical size of the compressed three replicas is roughly 1:1 compared to the physical size of a single MySQL database. If the only goal is to save space, there is no need to migrate from MySQL to TiDB.
For the same database and table data, after migrating from MySQL to TiDB, even with 3 replicas, it only takes up about 70% of the original MySQL space, which still saves a lot of space.
"The ‘compression ratio’ mentions a comparison, so you need to clarify what TiDB is being compared to—is it being compared to disk or to other databases?
If it is being compared to other databases, the best approach is to deploy the databases to be compared in the same environment (distinguishing between default parameters and optimized parameters), write tables of the same data scale into each, and compare the disk usage."
When importing data, check the monitoring in Grafana under tikv-details → rocksdb-kv → compression ratio.
Since 90% of the data in the LSM tree is in the final layer, level-6, just check if the compression ratio of level-6 meets expectations.
This is related to specific data, so it’s impossible to determine a single value. I’ve seen compression ratios over 20, and some around 1.x. Grafana has a chart that shows the overall KV compression ratio: PD > Statistics Balance > Size Amplification.
The data compression ratio should be compared with other databases, right?
Special note: The image is sourced from the internet, and the authenticity and accuracy of the data content cannot be guaranteed. Please consider it as informational only.
Single instance with 3 replicas, what about the KV nodes? Should the compression ratio still take into account the amplification effect of KV?
It’s best to actually check the physical disk usage; the data retrieved from the views is not accurate.
Comparing space usage, it should be viewed from the overall cluster perspective, considering the total disk space occupied by the three data replicas.
This is really nice, you can also see the overall compression rate of TiFlash.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.