Is the query efficiency of TiDB partitioned tables related to the number of partitions?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiDB分区表查询效率和分区数多少有关吗?

| username: jingyesi3401

[TiDB Usage Environment] Production Environment, Testing

[TiDB Version] v6.1.5, v6.5.1

[Encountered Problem: Problem Phenomenon and Impact] Currently, we are creating partition tables in the testing environment, planning to create a new partition table every month, such as: p202212, p202212…, but does the query efficiency of TiDB partition tables relate to the number of partitions? How many partitions are appropriate to create at once? Previously, in MySQL databases, we created partitions for 30 years at once, but the execution plan in TiDB is too long and inconvenient to view. We are considering whether to create partitions for 5 years or 10 years.
Note: Partition table partition information is as follows:
PARTITION p202212 VALUES LESS THAN (TO_DAYS(‘2023-01-01’)),

PARTITION p202712 VALUES LESS THAN (TO_DAYS(‘2028-01-01’))

| username: Kongdom | Original post link

I think if the query condition includes a partition condition, the query efficiency should not be related to the total number of partitions. You can try it out to see.

| username: Defined2014 | Original post link

In the case of dynamic pruning, the execution plan will not be too long.

If there are query conditions, the unused partition tables will be pruned.

| username: zhanggame1 | Original post link

What does it have to do with the total number of partitions? It has nothing to do with it. I think the query performance of TiDB is not even significantly related to whether it is partitioned or not.

| username: TiDB_C罗 | Original post link

I think using regions for storage in TiDB is equivalent to partitioning. Using partitions in TiDB, apart from making data deletion easier (for business needs with TTL), doesn’t have many other advantages!

| username: Jellybean | Original post link

If each of your query conditions includes a partition column filter and you enable the partition dynamic pruning feature, it can automatically filter out partitions that do not need to be accessed, scanning only the partitions that meet the conditions, resulting in higher efficiency.

However, if you need to scan the entire table or many partitions, the efficiency will be relatively low due to the lack of a global index across partitions. In this case, having too many partitions may not be beneficial.

| username: xfworld | Original post link

There is a conflict in the documentation here, but I understand that partitioned tables have a maximum limit, so it is still something to be aware of.

Additionally, the support for secondary indexes in partitioned tables has some issues. You need to carefully consider the scenarios when using them, otherwise, you might run into problems…

| username: zhanggame1 | Original post link

The document above is not the latest version, it feels very outdated.

| username: xfworld | Original post link

Well, 1024 is from the official blog, and 8192 is the latest.

| username: zhanggame1 | Original post link

If you select version 4.0 in the official documentation, you can see 1024.

| username: dockerfile | Original post link

I also have this question, which brings up an issue: what is the difference in the distribution and composition of regions in TiDB with and without partitions?

| username: zhanggame1 | Original post link

A partitioned table can be seen as multiple tables, each partition has a tableID, and a region is created simultaneously when the partition is created.

| username: redgame | Original post link

Look at the data volume and test the speed.

| username: tidb菜鸟一只 | Original post link

  1. A partitioned table in TiDB is equivalent to multiple physical tables combined into a single logical table. However, you can specify which physical table’s data to query through the partition field. If the partition field is not specified in the SQL query, it will query all physical tables. Compared to a regular table, this adds a merge operation, which slightly impacts performance.
  2. You can directly create partitions for 30 years at once, and empty partitions do not affect efficiency.
  3. If the execution plan is too long, you can enable set tidb_partition_prune_mode = 'dynamic';. If the query statement specifies the partition field, it will only return the execution plan corresponding to the specified partition field.
| username: jingyesi3401 | Original post link

Okay, thank you. We are currently preparing to create 10 years of partitions, which means 120 partition tables.

| username: jingyesi3401 | Original post link

Initially tested in the testing environment, later preparing for production testing.

| username: jingyesi3401 | Original post link

Okay, thank you for the guidance.

| username: Sean007 | Original post link

TiDB’s Region is automatically sharded, so as long as you follow TiDB’s best practices (such as using AUTO_RANDOM or non-clustered primary keys), you don’t need to pay much attention to it. Partitioned tables are equivalent to manual sharding, allowing users to choose an appropriate sharding strategy based on their business needs. Combining both can better distribute data, avoid hotspots, and improve performance.

| username: zhanggame1 | Original post link

I tested that the write performance of non-clustered primary keys is much worse than that of clustered primary keys. Generally speaking, there is no need to use them.

| username: Sean007 | Original post link

Yes, when the business does not require ordered primary keys, it is more recommended to use AUTO_RANDOM.