What data volume requires the use of partitioned tables?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 什么数据量需要用分区表

| username: TiDBer_MjVuP1ZB

In what situations should partitioned tables be used?

| username: forever | Original post link

The business frequently deletes data, and after partitioning, directly deletes partitions to unload historical data.
Data is distributed according to time and historical data needs to be retained.
To solve the table read hotspot issue, the table is split into smaller tables to address the read hotspot problem.

| username: 随缘天空 | Original post link

Use partitioned tables when dealing with large amounts of data and potential hotspot issues.

| username: residentevil | Original post link

If business data can be periodically saved according to a certain field, such as time, and the data retention policy is within one month, then you can create a partition for each day based on the date field. This is generally how it’s used, making subsequent maintenance easier (including creating new partitions, deleting partitions, etc.).

| username: zhanggame1 | Original post link

If there is no large-scale data deletion requirement, you don’t need it. If you use version 7.5.1 or later, it can be used passably. There are some issues with earlier versions.

| username: redgame | Original post link

Commonly used when resolving table read hotspots

| username: zhaokede | Original post link

In other databases, partition tables are generally used only for very large storage volumes.

| username: 小于同学 | Original post link

Query hotspot data.

| username: porpoiselxj | Original post link

TiDB, if there is no need to regularly clean up historical data, there is no need for partitioning.

| username: Kongdom | Original post link

:joy: The original purpose of choosing TiDB was to avoid using partitioned tables.

| username: zhanggame1 | Original post link

It can’t be solved, right? The partition hotspot in KV won’t change.

| username: DBAER | Original post link

Separation of hot and cold data

| username: 哈喽沃德 | Original post link

For businesses that regularly delete data, with tens of millions of partitions, there’s no need for partitioning if there’s no deletion business. It’s best to separate hot and cold data if possible.

| username: FutureDB | Original post link

For scenarios that require saving historical data and managing its lifecycle, partitioned tables are quite suitable. For example, transaction flow data and log data, which do not need to be stored permanently and require periodic cleanup of historical data, are suitable for using partitioned tables.

| username: kelvin | Original post link

Use partitioned tables when dealing with large amounts of data and potential hotspot issues.

| username: tidb菜鸟一只 | Original post link

In TiDB, the use of partition tables is not related to the amount of data. Partition tables are only recommended if you have a regular need to clean up data; otherwise, no matter how large the data volume is, it is not recommended to use partition tables.

| username: TiDBer_5cwU0ltE | Original post link

First, the data volume must be large, and secondly, there must be a clear partition key. It is best if queries are executed within the partition. It feels like partitioned tables, to some extent, can also be considered a form of hot and cold data separation.

| username: TiDBer_RjzUpGDL | Original post link

Hot data queries can include partition conditions.

| username: Kongdom | Original post link

The following is sourced from the internet:

  1. The table is so large that it cannot be entirely stored in memory, or only the last part of the table has hot data, while the rest is historical data.
  2. Partitioned table data is easier to maintain. For example, to delete a large amount of data in bulk, you can use the method of clearing an entire partition. You can also optimize, check, repair, etc., an individual partition.
  3. Partitioned table data can be distributed across different physical devices, thereby efficiently utilizing multiple hardware devices.
  4. Partitioned tables can be used to avoid certain specific bottlenecks, such as InnoDB’s single index mutex access.
  5. If needed, individual partitions can also be backed up and restored, which is highly effective in scenarios with very large datasets.
| username: TiDBer_rvITcue9 | Original post link

Large data volume and historical data retention