How to Test TiDB's Data Compression Ratio

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 如何测试tidb的数据压缩比

| username: swino

How to test the data compression ratio of TiDB

| username: Fly-bird | Original post link

Refer to the official documentation, or write the same data into MySQL and TiDB respectively, and check the disk usage.

| username: tidb菜鸟一只 | Original post link

In reality, because TiDB uses three replicas, although it can be compressed, the final physical size of the compressed three replicas is roughly 1:1 compared to the physical size of a single MySQL database. If the only goal is to save space, there is no need to migrate from MySQL to TiDB.

| username: heiwandou | Original post link

Load a 300GB database and see how much TiKV occupies.

| username: Jellybean | Original post link

For the same database and table data, after migrating from MySQL to TiDB, even with 3 replicas, it only takes up about 70% of the original MySQL space, which still saves a lot of space.

| username: Jellybean | Original post link

"The ‘compression ratio’ mentions a comparison, so you need to clarify what TiDB is being compared to—is it being compared to disk or to other databases?

If it is being compared to other databases, the best approach is to deploy the databases to be compared in the same environment (distinguishing between default parameters and optimized parameters), write tables of the same data scale into each, and compare the disk usage."

| username: 有猫万事足 | Original post link

When importing data, check the monitoring in Grafana under tikv-details → rocksdb-kv → compression ratio.

Since 90% of the data in the LSM tree is in the final layer, level-6, just check if the compression ratio of level-6 meets expectations.

| username: Soysauce520 | Original post link

This is related to specific data, so it’s impossible to determine a single value. I’ve seen compression ratios over 20, and some around 1.x. Grafana has a chart that shows the overall KV compression ratio: PD > Statistics Balance > Size Amplification.

| username: TiDBer_小阿飞 | Original post link

The data compression ratio should be compared with other databases, right?

| username: Jellybean | Original post link

This is an image I saw in the community before, and I am posting it here for reference only.

Special note: The image is sourced from the internet, and the authenticity and accuracy of the data content cannot be guaranteed. Please consider it as informational only.

| username: TiDBer_小阿飞 | Original post link

Single instance with 3 replicas, what about the KV nodes? Should the compression ratio still take into account the amplification effect of KV?

| username: zhanggame1 | Original post link

It’s best to actually check the physical disk usage; the data retrieved from the views is not accurate.

| username: Jellybean | Original post link

Comparing space usage, it should be viewed from the overall cluster perspective, considering the total disk space occupied by the three data replicas.

| username: 有猫万事足 | Original post link

This is really nice, you can also see the overall compression rate of TiFlash. :+1:

| username: TiDBer_小阿飞 | Original post link

Yes, all replicas should be considered comprehensively.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.