Issues with Primary Keys in Partitioned Tables

translator_bot · June 21, 2024, 8:49am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于分区表的主键问题

| username: zhanggame1

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Reproduction Path] What operations were performed to encounter the issue
[Encountered Issue: Issue Phenomenon and Impact]
Test Case: There are two partitioned tables, both using list partitioning with the partition key draw_id. The difference is in the primary key. The primary key for ticket2 is (draw_id, ticket_no), while for ticket1 it is (ticket_no, draw_id). The data volume is 27 million, and the data is exactly the same. The table creation statements are as follows:

The execution plans for the same SQL are different

The issue is:
Partitioned tables require the partition key to be part of the primary key, making the primary key a composite key. Does the partition key have to be the first column in the primary key index? In my test, if the partition key is the second column, the SQL performs a full table scan.

translator_bot · June 21, 2024, 8:49am

| username: zhanggame1 | Original post link

From the test, when partitioning by draw_id, as long as draw_id is not in the first position of the primary key, the execution plan will perform a full table scan.

translator_bot · June 21, 2024, 8:49am

| username: 小龙虾爱大龙虾 | Original post link

It’s the leftmost prefix principle of the index (Huh? What? You said the join key draw_id is the partition key of the list partition table, so it can’t use NL_JOIN and then partition pruning to do a full partition scan? Maybe it just hasn’t been implemented yet )

translator_bot · June 21, 2024, 8:49am

| username: zhanggame1 | Original post link

Theoretically, having the partition key in the where condition should directly query these partitions instead of performing a full table scan, so this is quite puzzling.

translator_bot · June 21, 2024, 8:49am

| username: 小龙虾爱大龙虾 | Original post link

You’re talking about the simple query on the second floor, right? Both of them went through partition pruning. You can see the partition keyword in the access object. Actually, TiDB doesn’t have an operator for full partition scan; a full partition scan is displayed as a full table scan.

translator_bot · June 21, 2024, 8:49am

| username: tidb菜鸟一只 | Original post link

Is there a difference in execution time between the two SQL queries?

translator_bot · June 21, 2024, 8:49am

| username: dba远航 | Original post link

Whether the primary key index should be the first or second depends on the filtering conditions used.

translator_bot · June 21, 2024, 8:49am

| username: forever | Original post link

The actrows of the two SQLs are the same. The upper SQL follows the leftmost index principle and uses index join, while the lower SQL does not have a suitable index and uses hash join. If you create an index for the table in the lower SQL, both will be the same.

translator_bot · June 21, 2024, 8:49am

| username: wangccsy | Original post link

The training video for TiDB explains the basics well. Not bad. I don’t use partitioned tables much either. But it’s covered in the video, though the video can’t consider all scenarios.

translator_bot · June 21, 2024, 8:49am

| username: zhanggame1 | Original post link

The query result speed of the second floor’s SQL query is the same. I think the estimated number of rows in the execution plan is incorrect.

translator_bot · June 21, 2024, 8:49am

| username: tidb菜鸟一只 | Original post link

It seems that there is an issue with the execution plan display, but the actual execution is fine.