Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 一张表两亿数据,用tidb,底层还要做分区么

For a table with 200 million records using TiDB, do we still need to do partitioning at the underlying level?
Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: 一张表两亿数据,用tidb,底层还要做分区么
For a table with 200 million records using TiDB, do we still need to do partitioning at the underlying level?
The main purpose of partitioning is to facilitate future data deletion. You can simply delete the partition. Using delete in TiDB is very troublesome.
It depends on the hardware environment. If it is standard or over-configured, I think there is no need to partition.
If it is date-related data, such as an increase of 20 million per month, and it needs to be cleaned up every year, you can use partitioning for easier cleanup; or, if you only need to query the latest month’s data each time and the corresponding SQL specifies the partition key, you can also use partitioning. In this case, the query will only scan the latest partition.
Partitioning can provide maintenance and efficiency improvements. For example, directly deleting a partition can delete the corresponding data. When partitioning by range or list, the query optimizer will generate an execution plan that only scans the corresponding partition data, avoiding the heavy operation of a full table scan (non-point query). Furthermore, if you need to distinguish between cold, warm, and hot data, as well as online and offline data, partitioned tables are still necessary.
Well, for this kind of large table business, we can only make it into a wide table. It’s not suitable to join other tables for OLTP. Now we are using various redundancies to make wide tables, but maintaining data consistency will be an issue.