Clustered Index Not Effective on Partitioned Table

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 聚簇索引在分区表上未生效

| username: EricSong

【TiDB Usage Environment】Production
【TiDB Version】v6.5.0
【Reproduction Path】
Test using the following two tables

CREATE TABLE `test_nonclustered` (
  `day` date NOT NULL,
  `partner_org_id` varchar(60) NOT NULL,
  `partner_name` text DEFAULT NULL,
  `customer_org_id` varchar(60) NOT NULL,
  `license_id` varchar(255) NOT NULL,
  PRIMARY KEY (`day`,`partner_org_id`,`customer_org_id`,`license_id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (WEEKDAY(`day`))
(PARTITION `p0` VALUES LESS THAN (1),
 PARTITION `p1` VALUES LESS THAN (2),
 PARTITION `p2` VALUES LESS THAN (3),
 PARTITION `p3` VALUES LESS THAN (4),
 PARTITION `p4` VALUES LESS THAN (5),
 PARTITION `p5` VALUES LESS THAN (6),
 PARTITION `p6` VALUES LESS THAN (7));

CREATE TABLE `test_clustered` (
  `day` date NOT NULL,
  `partner_org_id` varchar(60) NOT NULL,
  `partner_name` text DEFAULT NULL,
  `customer_org_id` varchar(60) NOT NULL,
  `license_id` varchar(255) NOT NULL,
  PRIMARY KEY (`day`,`partner_org_id`,`customer_org_id`,`license_id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (WEEKDAY(`day`))
(PARTITION `p0` VALUES LESS THAN (1),
 PARTITION `p1` VALUES LESS THAN (2),
 PARTITION `p2` VALUES LESS THAN (3),
 PARTITION `p3` VALUES LESS THAN (4),
 PARTITION `p4` VALUES LESS THAN (5),
 PARTITION `p5` VALUES LESS THAN (6),
 PARTITION `p6` VALUES LESS THAN (7));

INSERT into test_nonclustered (day,partner_org_id,partner_name,customer_org_id,license_id) VALUES
("2020-01-01","1","partner1","customer1","license1")
...
INSERT into test_nonclustered (day,partner_org_id,partner_name,customer_org_id,license_id) VALUES
("2020-01-01","1","partner1","customer1","license1")
...
(See attachment for data insertion statements)

Data (31.8 KB)
【Encountered Problem: Problem Phenomenon and Impact】
On partitioned tables, if /*+ USE_INDEX(tn, PRIMARY) */ is not used to specify the index, the non-clustered index table will not use the index either. For the clustered index table, regardless of whether it is specified, it cannot use the index and directly scans the entire table.
This is very troublesome when the data volume is large (Problem Source), so I would like to ask if there are any optimization methods?

explain select /*+ USE_INDEX(tn, PRIMARY) */ * from test_nonclustered tn order by day limit 10;
TopN_8	10.00	root		test.test_nonclustered.day, offset:0, count:10
└─IndexLookUp_15	10.00	root	partition:all	
  ├─Limit_14(Build)	10.00	cop[tikv]		offset:0, count:10
  │ └─IndexFullScan_12	1000.00	cop[tikv]	table:tn, index:PRIMARY(day, partner_org_id, customer_org_id, license_id)	keep order:false
  └─TableRowIDScan_13(Probe)	10.00	cop[tikv]	table:tn	keep order:false
explain select /*+ USE_INDEX(tn, PRIMARY) */ * from test_clustered order by day limit 10;
TopN_7	10.00	root		test.test_clustered.day, offset:0, count:10
└─TableReader_14	10.00	root	partition:all	data:Limit_13
  └─Limit_13	10.00	cop[tikv]		offset:0, count:10
    └─TableFullScan_12	1000.00	cop[tikv]	table:test_clustered	keep order:false
| username: dba远航 | Original post link

After the table is created, use SHOW CREATE TABLE table_name to check the specifics of the table you created, and then make a judgment. So, please provide the SHOW statements for these two tables so that everyone can make a judgment.

| username: EricSong | Original post link

The show create table statement shows the same as above, it looks like no changes occurred during creation.

CREATE TABLE `test_clustered` (
  `day` date NOT NULL,
  `partner_org_id` varchar(60) NOT NULL,
  `partner_name` text DEFAULT NULL,
  `customer_org_id` varchar(60) NOT NULL,
  `license_id` varchar(255) NOT NULL,
  PRIMARY KEY (`day`,`partner_org_id`,`customer_org_id`,`license_id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (WEEKDAY(`day`))
(PARTITION `p0` VALUES LESS THAN (1),
 PARTITION `p1` VALUES LESS THAN (2),
 PARTITION `p2` VALUES LESS THAN (3),
 PARTITION `p3` VALUES LESS THAN (4),
 PARTITION `p4` VALUES LESS THAN (5),
 PARTITION `p5` VALUES LESS THAN (6),
 PARTITION `p6` VALUES LESS THAN (7))

CREATE TABLE `test_nonclustered` (
  `day` date NOT NULL,
  `partner_org_id` varchar(60) NOT NULL,
  `partner_name` text DEFAULT NULL,
  `customer_org_id` varchar(60) NOT NULL,
  `license_id` varchar(255) NOT NULL,
  PRIMARY KEY (`day`,`partner_org_id`,`customer_org_id`,`license_id`) /*T![clustered_index] NONCLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY RANGE (WEEKDAY(`day`))
(PARTITION `p0` VALUES LESS THAN (1),
 PARTITION `p1` VALUES LESS THAN (2),
 PARTITION `p2` VALUES LESS THAN (3),
 PARTITION `p3` VALUES LESS THAN (4),
 PARTITION `p4` VALUES LESS THAN (5),
 PARTITION `p5` VALUES LESS THAN (6),
 PARTITION `p6` VALUES LESS THAN (7))
| username: 小龙虾爱大龙虾 | Original post link

  1. When the optimizer chooses not to use an index for a non-clustered table, if you think that using the primary key to access the table is optimal, you can post both execution plans for review.
  2. The reason a clustered table does not use the primary key index is that a clustered table is organized by the primary key itself. The table is the primary key index, and the primary key index is the table. As the saying goes, “Hu Wan is the bandit, and the bandit is Hu Wan,” so a table scan on a clustered table is essentially a primary key index scan.
| username: forever | Original post link

How much production data is there? From the current test data, the execution plan is correct. The table has a total of 5 columns, 4 of which are in the primary key. If you still need to go back to the table to fetch all column data after looking up the data, it becomes meaningless. A full table scan of a clustered table is also based on the primary key. You can query partial columns that can use the index. Also, I remember that TiDB’s partitioned tables use local indexes (I couldn’t find the information, I forgot where I saw it, hope someone can correct me). Your date is in a daily range, so the time within each partition is actually unordered, making the index meaningless.

| username: EricSong | Original post link

Thank you for the explanation. The production environment has around 300 million rows of data, and the specific situation is described in this question. The table is extremely slow when performing an order by query, making it almost impossible to query. The execution plan shows a full table scan. I understand that due to the characteristics of the clustered index, TableFullScan indicates IndexFullScan, so a full table scan actually means using the index.

However, what I don’t quite understand is: the index should indicate that the data is ordered, so in the query plan of v4.0.11, we can see that it takes the top five from each of the seven partitions and then compares them in TiDB.

Your date is in a daily range, so the time within each partition is actually unordered, making the index meaningless.

In v6.5.0, it directly scans all the records. Does this mean that the clustered index is no longer ordered within the partition? I understand that even if it is partitioned by WEEKDAY, it can still maintain order within the partition. If it is not ordered, it should be specifically mentioned, but I did not see this point mentioned in the documentation of the clustered index.

TopN_24 5.00  root    analytics.day:desc, offset:0, count:5
└─PartitionUnion_28 35.00 root    
  ├─Projection_44 5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_43  5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_42(Build) 5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_40  5.00  cop[tikv] table:subscription_analytics, partition:p0, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_41(Probe)  5.00  cop[tikv] table:subscription_analytics, partition:p0  keep order:false, stats:pseudo
  ├─Projection_64 5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_63  5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_62(Build) 5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_60  5.00  cop[tikv] table:subscription_analytics, partition:p1, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_61(Probe)  5.00  cop[tikv] table:subscription_analytics, partition:p1  keep order:false, stats:pseudo
  ├─Projection_84 5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_83  5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_82(Build) 5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_80  5.00  cop[tikv] table:subscription_analytics, partition:p2, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_81(Probe)  5.00  cop[tikv] table:subscription_analytics, partition:p2  keep order:false, stats:pseudo
  ├─Projection_104  5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_103 5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_102(Build)  5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_100 5.00  cop[tikv] table:subscription_analytics, partition:p3, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_101(Probe) 5.00  cop[tikv] table:subscription_analytics, partition:p3  keep order:false, stats:pseudo
  ├─Projection_124  5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_123 5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_122(Build)  5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_120 5.00  cop[tikv] table:subscription_analytics, partition:p4, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_121(Probe) 5.00  cop[tikv] table:subscription_analytics, partition:p4  keep order:false, stats:pseudo
  ├─Projection_144  5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
  │ └─IndexLookUp_143 5.00  root    limit embedded(offset:0, count:5)
  │   ├─Limit_142(Build)  5.00  cop[tikv]   offset:0, count:5
  │   │ └─IndexFullScan_140 5.00  cop[tikv] table:subscription_analytics, partition:p5, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
  │   └─TableRowIDScan_141(Probe) 5.00  cop[tikv] table:subscription_analytics, partition:p5  keep order:false, stats:pseudo
  └─Projection_164  5.00  root    analytics.day, analytics.partner_org_id, analytics.partner_name, analytics.customer_org_id, analytics.customer_name, analytics.offer_code, analytics.external_subscription_id, analytics.subscription_start_date, analytics.subscription_end_date, analytics.is_trial, analytics.is_auto_renew, analytics.available_licenses, analytics.assigned_licenses, analytics.actively_used_licenses, analytics.capacity, analytics.trial_duration, analytics.license_status, analytics.license_id, analytics.workload, analytics.daily_actively_used_licenses, analytics.weekly_actively_used_licenses, analytics.is_sa_display, analytics.decrypted_customer_name
    └─IndexLookUp_163 5.00  root    limit embedded(offset:0, count:5)
      ├─Limit_162(Build)  5.00  cop[tikv]   offset:0, count:5
      │ └─IndexFullScan_160 5.00  cop[tikv] table:subscription_analytics, partition:p6, index:PRIMARY(day, partner_org_id, customer_org_id, offer_code, external_subscription_id, license_id) keep order:true, desc
      └─TableRowIDScan_161(Probe) 5.00  cop[tikv] table:subscription_analytics, partition:p6  keep order:false, stats:pseudo
| username: EricSong | Original post link

Yes, I now understand that a TableFullScan with a clustered index is equivalent to an IndexFullScan. However, what I don’t understand is that in this issue, under the same partition conditions, v4.0.11 can retrieve the largest 5 from 7 partitions to quickly output the result, but v6.5.0 scans all the records.

I understand that a clustered index should also be ordered, so it should be possible to retrieve the largest 5 from 7 partitions for comparison. But it seems this is not working currently. Is it because the clustered index is not ordered within the partitions?

| username: 小龙虾爱大龙虾 | Original post link

A clustered index is ordered. Since it takes more than half an hour to run on your new cluster, you can run the SQL and then use “explain for connection” to check the execution plan and see if there is any useful information.

| username: EricSong | Original post link

Thank you for the reply. I conducted a test at 索引失效 - TiDB 的问答社区, and it seems that the clustered index is indeed invalid within the partition. The actRows and the full amount are consistent, indicating that the entire table was scanned rather than fetching the first few rows to be merged upstream.

Does this mean that the clustered index is invalid within the partition? Otherwise, if the clustered index is ordered within the partition, then the actRows should be consistent with the non-clustered index.

| username: 芮芮是产品 | Original post link

To use the index, you must include the condition for the week, such as week=3.

| username: forever | Original post link

Comparing it this way, I feel that there is indeed a problem with the execution plan. I tested it with your test table and data. When the query column only includes (partner_name) and is not in the primary key column, it performs a full table scan. The estrow is the data of all partitions, which is not very scientific. The primary key is sorted by day, so it should be able to use the primary key index.