Does TiDB currently support table-level compression?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB 现在表级别压缩是否支持?

| username: 麻烦是朋友

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
I would like to inquire if TiDB has table-level compression? There is a compression option in the SQL syntax.


Found on github.com:
|Hi @shenlijungg,
Compression is not supported now. If the create table statement contains compression options, TiDB will ignore them.|
Does tidb support column compression or table compression? · Issue #8847 · pingcap/tidb · GitHub

May I ask in which version the compression option will be supported?

| username: Kongdom | Original post link

Currently, it seems that ignoring them is expected.

TiDB supports the following table_option. TiDB will parse and ignore other table_option parameters, such as AVG_ROW_LENGTH, CHECKSUM, COMPRESSION, CONNECTION, DELAY_KEY_WRITE, ENGINE, KEY_BLOCK_SIZE, MAX_ROWS, MIN_ROWS, ROW_FORMAT, and STATS_PERSISTENT.

| username: forever | Original post link

It should not be supported. There is no introduction to compression syntax or examples in the official documentation.

| username: 小龙虾爱大龙虾 | Original post link

The underlying storage engine of TiDB is TiKV, which is a transaction-supported storage engine that has built-in compression.

| username: TIDB-Learner | Original post link

Sourced from the internet:
TiDB is a distributed relational database system that uses column storage to store data and provides data compression functionality. When using TiDB for data compression, you need to specify the COLUMN_FORMAT parameter as COMPRESSED when creating the table.
Here is an example code for using TiDB for data compression:

CREATE TABLE mytable (
id int,
name varchar(50)
) COLUMN_FORMAT=COMPRESSED;

In TiDB, data is compressed using the Snappy compression algorithm, which can effectively reduce storage space usage. Unlike MySQL, TiDB automatically decompresses data during queries and compresses the query results again before returning them to reduce data transmission overhead.
MySQL uses dictionary encoding algorithms for compression, while TiDB uses the Snappy algorithm. In terms of read and write performance, MySQL can improve performance through optimized parameter configuration and index design, whereas TiDB can improve performance by increasing the number of nodes. Developers can choose the appropriate database system and optimization methods based on specific needs.

| username: zhanggame1 | Original post link

All table data in TiDB is compressed, and the SST files you see are the compressed versions.

| username: dba远航 | Original post link

This syntax is not supported at the moment.

| username: Kongdom | Original post link

It’s not the same meaning, he is talking about the options in the CREATE TABLE statement. :yum:

| username: 春风十里 | Original post link

This syntax is problematic; I tested it and it directly reported an error. I couldn’t find the specific usage of COLUMN_FORMAT in the TiDB documentation, nor did I see the COMPRESSED parameter.

I checked the official MySQL documentation, and in MySQL, COLUMN_FORMAT corresponds to the parameters FIXED, DYNAMIC, and DEFAULT. There is no COMPRESSED parameter.

In NDB Cluster, it is also possible to specify a data storage format for individual columns of NDB tables using COLUMN_FORMAT. Permissible column formats are FIXED, DYNAMIC, and DEFAULT. FIXED is used to specify fixed-width storage, DYNAMIC permits the column to be variable-width, and DEFAULT causes the column to use fixed-width or variable-width storage as determined by the column’s data type (possibly overridden by a ROW_FORMAT specifier).

This seems to be an AI-generated answer, but it is incorrect.

TiDB compression is implemented through TiKV’s RocksDB, and it is automatically enabled by default. You can specify the compression algorithm in TiKV.

Reference:
TiKV Memory Parameter Performance Tuning | PingCAP Documentation Center

[rocksdb.defaultcf]
Data block size. RocksDB compresses data in blocks, and blocks are also the smallest unit cached in the block-cache (similar to the page concept in other databases).
block-size = “64KB”

The compression method for each level of RocksDB data can be: no, snappy, zlib, bzip2, lz4, lz4hc, zstd. Note that Snappy compressed files must follow the official Snappy format. Other non-official compression formats are not supported.
no:no:lz4:lz4:lz4:zstd:zstd means no compression for level0 and level1, lz4 compression for level2 to level4, and zstd compression for level5 and level6.
no means no compression, lz4 is a balanced compression algorithm in terms of speed and compression ratio, zlib has a high compression ratio and is storage-friendly but slow in compression speed and requires more CPU resources. Different machines need to configure the compression method based on CPU and I/O resources. For example, if the compression method is “no:no:lz4:lz4:lz4:zstd:zstd”, and during heavy data writes (data import), you find that the system’s I/O pressure is high (using iostat to find %util continuously at 100% or using top command to find high iowait), but CPU resources are still sufficient, you can consider enabling compression for level0 and level1 to trade CPU resources for I/O resources. If the compression method is “no:no:lz4:lz4:lz4:zstd:zstd”, and during heavy data writes, you find that the system’s I/O pressure is low but CPU resources are exhausted, and top -H shows many bg-prefixed threads (RocksDB compaction threads) running, you can consider trading I/O resources for CPU resources by changing the compression method to “no:no:no:lz4:lz4:zstd:zstd”. The goal is to maximize the utilization of existing system resources to fully leverage TiKV’s performance under current resource conditions.
compression-per-level = [“no”, “no”, “lz4”, “lz4”, “lz4”, “zstd”, “zstd”]

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.