Using TiDB Data Partitioning

translator_bot · June 23, 2024, 4:57am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb数据分区使用

| username: Jolyne

【TiDB Usage Environment】Production, Testing, Research
Production
【TiDB Version】
5.2
【Encountered Problem】
Does TiDB support automatic partition creation and deletion of historical partitions? For example, if partitions are created based on time, and there are 4 partitions with the last partition’s time being today, can it automatically create a partition for tomorrow (changing the partition mode to dynamic)? Is there a parameter setting to automatically create or delete partitions when creating a table?
For example, settings in other distributed databases:

(
“dynamic_partition.enable” = “true”,
“dynamic_partition.create_history_partition” = “true”, – Enable historical partitions
“dynamic_partition.history_partition_num” = “7”, – Number of historical partitions
“dynamic_partition.time_unit” = “DAY”,
“dynamic_partition.end” = “7”,
“dynamic_partition.prefix” = “p”,
“dynamic_partition.buckets” = “20”,
“replication_num” = “3”
);

【Reproduction Path】What operations were performed that led to the problem
【Problem Phenomenon and Impact】

【Attachments】

Please provide the version information of each component, such as cdc/tikv, which can be obtained by executing cdc version/tikv-server --version.

translator_bot · June 23, 2024, 4:57am

| username: h5n1 | Original post link

This feature is not supported yet, but it will be available in future versions.

github.com/pingcap/tidb

Support RANGE INTERVAL partitioning

opened 09:28AM - 23 Jun 22 UTC

closed 11:06AM - 09 Aug 22 UTC

mjonss

type/enhancement type/new-feature

## Enhancement RANGE table partitioning can be very verbose when having many pa…rtitions with the same range interval, so TiDB should extend RANGE with INTERVAL which would take this thousand row CREATE TABLE: ```sql CREATE TABLE t (id bigint unsigned, data varchar(255)) PARTITION BY RANGE (id) (PARTITION pNull VALUES LESS THAN (0), PARTITION p0 VALUES LESS THAN (1000000), PARTITION p1 VALUES LESS THAN (2000000), ... PARTITION p998 VALUES LESS THAN (999000000), PARTITION pMax VALUES LESS THAN (MAXVALUE)) ``` and allow it to be written as something like ```sql CREATE TABLE t (id bigint unsigned, data varchar(255)) PARTITION BY RANGE (id) INTERVAL (1000000) FIRST PARTITION LESS THAN (1000000) LAST PARTITION LESS THAN (999000000) NULL PARTITION MAXVALUE PARTITION ``` Also maintenance/ALTER commands needs to be extended, like ```sql -- No change to existing ADD/DROP PARTITION, but simplified creation of new partitions, extending the range partitioning scheme, -- this would add partitions for every interval from current LAST PARTITION expr to the new expr ALTER TABLE t LAST PARTITION LESS THAN (expr) -- And simplified DROP PARTITION, which will drop all partitions less than the matched FIRST partition (not touching the NULL partition if exists) ALTER TABLE t FIRST PARTITION LESS THAN (expr) -- REORGANIZE PARTITION (only syntax supported, not yet implemented in TiDB) -- For the FIRST PARTITION, reorganize all partitions before the new FIRST PARTITION expr into the new FIRST PARTITION ALTER TABLE MERGE FIRST PARTITION LESS THAN (expr) -- For the MAXVALUE PARTITION, create new partitions up to the new expr (which will become the new LAST PARTITION expr) and reorganize the data in the old MAXVALUE PARTITION into the new partitions, including a new MAXVALUE PARTITION ALTER TABLE SPLIT MAXVALUE PARTITION LESS THAN (expr) ``` This can be implemented without changing any stored meta data or impact on backwards compatibility, by translating the new syntax (together with existing table metadata for ALTER commands) into normal CREATE/ALTER TABLE commands. ### Suggested syntax #### CREATE TABLE ```sql CREATE TABLE ... PARTITION BY RANGE [COLUMNS] (partition_col [, partition_col...]) INTERVAL (<interval expr>) FIRST PARTITION LESS THAN (expr) LAST PARTITION LESS THAN (expr) [NULL PARTITION] [MAXVALUE PARTITION] ``` Where `interval expr` for temporal column types are defined [as for MySQL](https://dev.mysql.com/doc/refman/8.0/en/expressions.html#temporal-intervals) but without the INTERVAL token. Supported time units are YEAR, QUARTER, MONTH, WEEK, DAY, HOUR, MINUTE and SECOND. Examples: ``` expr unit 1 YEAR 1 QUARTER 1 MONTH 1 DAY 30 MINUTE ``` Full example for daily partitions for year 2021 and 2022: ``` CREATE TABLE t (id bigint, dt datetime, data varchar(255)) PARTITION BY RANGE COLUMNS (dt) INTERVAL (1 DAY) FIRST PARTITION LESS THAN ('2021-02-01') LAST PARTITION LESS THAN ('2023-01-01') ``` Since this is a very compact syntax, we will not support per partition options like Placement Rules (table level options will still be ok) #### Adding / splitting partitions In addition to ADD PARTITION, we also support adding more partitions by simply alter the LAST PARTITION expr: ``` ALTER TABLE <table_name> LAST PARTITION LESS THAN (expr) ``` Where `expr` is the same as for CREATE TABLE above. Since this is a very compact syntax, we will not support per partition options like Placement Rules (table level options will still be ok) And if there is a MAXVALUE partition, we need to split it (new command instead of REORGANIZE to avoid needing the generated partitioning name and make automation easier): ``` ALTER TABLE <table_name> SPLIT MAXVALUE PARTITION LESS THAN (expr) ``` Since REORGANIZE PARTITION is not yet supported, this new SPLIT command will only be parsed, but needs REORGANIZE PARTITION to be implemented) #### Drop partition / merge partitions `ALTER TABLE <table_name> DROP PARTITION <list of partitions>` will still be extended to drop all partitions before a specified partition ```sql ALTER TABLE table_name FIRST PARTITION LESS THAN (expr) ``` Which will remove all partitions before the matching partition (except for a possible NULL partition) Merge partitions would two or more partitions into a new FIRST PARTITION without deleting any data: ```sql ALTER TABLE table_name MERGE FIRST PARTITION LESS THAN (expr) ``` Only to be implemented in parser until REORGANIZE PARTITION is implemented, same as SPLIT PARTITION ### Limitations - Only for integer columns/partition expressions and date/datetime column types in RANGE COLUMNS. - Automatically add or drop partitions is not a part of this enhancement (internally it may mix DML and DDL). - Since we will not change any metadata, for the new ALTER commands, we need to rediscover the interval, null partition, maxvalue partition, first partition and last partition, including verifying that all partitions would be the same as if they were generated with a CREATE TABLE statement with those interval arguments. If such validation fail, we will not execute the ALTER command and return an error. Partition names and partition level options are not taken into consideration when comparing with the discovered/generated table and the actual table. - Only changes to CREATE TABLE and ALTER TABLE syntax - No change in internal metadata, DDL functionality, just accepting syntactic sugar which will generate the same operations as its more verbose syntax. - No impact on SHOW CREATE TABLE (will still show full partition definition). [Simplification of SHOW CREATE TABLE is a separate issue](https://github.com/pingcap/tidb/issues/36137). - No impact on other tools like TiCDC etc. i.e. the new INTERVAL syntax will be converted to existing DDL jobs, including an updated statement string replacing the syntactic sugar part with the generated equivalent of the compatible syntax. - No support for per partition options like Placement Rules (still OK to have on Table level).

translator_bot · June 23, 2024, 4:57am

| username: xfworld | Original post link

Dynamic partitioning is not supported, but you can set partitions according to your own needs by referring to the official documentation:

Subsequently, you can define new partitions within this range

translator_bot · June 23, 2024, 4:57am

| username: Jolyne | Original post link

Okay, thank you. I’ve looked into this. Since we have a task that requires partitioning by time, it would be quite troublesome if manual addition cannot be automatically extended.

translator_bot · June 23, 2024, 4:57am

| username: alfred | Original post link

“Range partitioning, List partitioning, and List COLUMNS partitioning can be used to address performance issues caused by large-scale deletions in business scenarios, supporting quick partition deletions. Hash partitioning can be used to distribute data in scenarios with heavy write operations.”
Can distributed tables in TiDB improve query performance?

translator_bot · June 23, 2024, 4:57am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.