Consultation on a question: TiDB's distributed database uses partitioned tables. How to use them to be faster than regular tables?

translator_bot · June 22, 2024, 9:50pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 咨询一个问题 tidb的分布式数据库用分区表。怎么使用才会比普通表快？

| username: tidb狂热爱好者

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】5.4.3
【Reproduction Path】What operations were performed when the issue occurred
【Encountered Issue: Issue Phenomenon and Impact】
Can you provide specific examples of quick usage methods?
【Resource Configuration】
【Attachments: Screenshots / Logs / Monitoring】

translator_bot · June 22, 2024, 9:50pm

| username: dba-kit | Original post link

In MySQL, which is based on B+ trees, the primary function of partitioned tables is to reduce the size of a single tree, lower the total number of levels, improve query efficiency, or avoid MySQL’s single table size limit (which might be around 20TB, though I don’t remember the exact figure). However, in distributed databases like TiDB, which are based on LSM trees, data is typically scattered by dividing regions. For normal OLTP queries, adding an index theoretically would be faster and more convenient than converting to partitioned tables. The scenario I can think of might be for OLAP purposes. For example, if each processing involves all data for a month or a day, then directly reading the entire partition based on time might reduce the time for secondary table lookups compared to querying based on an index, potentially making it faster. (But I haven’t experimented with this; it’s just a guess.)

translator_bot · June 22, 2024, 9:50pm

| username: tidb狂热爱好者 | Original post link

Since TiDB itself is a distributed table, can I understand it as a single partition? Then what is the significance of partitioned tables within TiDB?

translator_bot · June 22, 2024, 9:50pm

| username: GreenGuan | Original post link

Partitioned tables in TiDB are similar to those in traditional relational databases, helping to avoid performance jitter caused by large transactions such as archiving.

translator_bot · June 22, 2024, 9:50pm

| username: dba-kit | Original post link

Additionally, it can be used in conjunction with placement rules for hot and cold data separation.

translator_bot · June 22, 2024, 9:50pm

| username: alfred | Original post link

Partition types in TiDB. The currently supported types include Range Partition, List Partition, List COLUMNS Partition, and Hash Partition. Range Partition, List Partition, and List COLUMNS Partition can be used to address performance issues caused by large-scale deletions in business scenarios, supporting quick partition deletion. Hash Partition can be used to distribute data in scenarios with high write volumes.

translator_bot · June 22, 2024, 9:50pm

| username: 张雨齐0720 | Original post link

All SQL operations that include partition keys generally perform better than those on regular tables.

translator_bot · June 22, 2024, 9:50pm

| username: 我是咖啡哥 | Original post link

I think the biggest advantage of partitioned tables is cleaning up historical data (time partition)

translator_bot · June 22, 2024, 9:50pm

| username: tidb菜鸟一只 | Original post link

But in fact, for OLAP scenarios, isn’t TiDB more recommended to use TiFlash directly? TiFlash is columnar storage, and partitions don’t apply to TiFlash, right?

translator_bot · June 22, 2024, 9:50pm

| username: h5n1 | Original post link

Partition tables facilitate historical data management.
Partition pruning is more efficient than full table scans.
Hash partitioning also inherently provides data distribution.

translator_bot · June 22, 2024, 9:50pm

| username: du拉松 | Original post link

I have conducted performance tests on partitioned and non-partitioned tables in version v6. In actual performance, partitioned tables are slower than non-partitioned tables. During execution, there are many more cop operations. The advantage of adding partitions is the ability to quickly delete related data. Since we have a lot of table data, deleting with the delete command is very painful and cannot reclaim the occupied space. Using partitions, we can directly drop the partition. I consulted relevant technical personnel, and there was no better way to handle it. I am using the range partitioning method.