Error or No Return When Executing max(index column) on Partitioned Large Table

translator_bot · June 23, 2024, 5:43am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 分区大表执行 max(索引列) 报错或无返回

| username: Jellybean

To improve efficiency, please provide the following information. A clear problem description can help solve the issue faster:

【TiDB Usage Environment】
Production environment, version 5.7.25-TiDB-v6.1.0

【Overview】 Scenario + Problem Overview
Background: The table has a total of about 50 billion rows, partitioned by day, with each partition having more than 100 million rows.

Partitioned Table Structure
Primary Key Index: PRIMARY KEY (dt, doc_id) /*T![clustered_index] NONCLUSTERED */,
PRIMARY KEY (dt, doc_id) /*T![clustered_index] NONCLUSTERED */,
KEY updatetime (updatetime),
KEY newdate (newdate)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T![placement] PLACEMENT POLICY=storeonssd */
PARTITION BY RANGE (UNIX_TIMESTAMP(dt))
(PARTITION p20210601 VALUES LESS THAN (1622563200),
…)

【Background】 Operations performed
Executed a max query on the table, where dt is a time field and the first field of the primary key index:
select max(dt) from logoutrole;

【Phenomenon】 Business and database phenomena
【Problem】 Current issues encountered

Executing the query results in an error or no return for a long time. The error message is as follows:
mysql> select max(dt) from logoutrole;
ERROR 1105 (HY000): other error: Coprocessor task terminated due to exceeding the deadline
mysql>
Execution plan

image1372×289 12.2 KB

【Business Impact】
Unable to correctly retrieve content

【TiDB Version】
v6.1.0

【Application Software and Version】
5.7.25-TiDB-v6.1.0

translator_bot · June 23, 2024, 5:43am

| username: 北京大爷 | Original post link

From your previous post, it seems that TOP was pushed down to all partitions.

From the execution plan, the longest task takes 45 seconds with a total of 11228 tasks. Also, your Placement rule shows a distinction between hot and cold data. Therefore, the overall query task will be very slow, which is logically expected.

The most direct and effective optimization suggestion is to force the use of partition pruning.
For example, add where dt > '2022-06-01 00:00:00'.

translator_bot · June 23, 2024, 5:43am

| username: Jellybean | Original post link

Forcing the use of partition pruning means forcing the business side to add partition field condition filtering in the where clause, which would require the business side to modify the code. Is there any other way to optimize on the server side without changing the business code?

translator_bot · June 23, 2024, 5:43am

| username: jansu-dev | Original post link

There is a PR merged into master at planner: support prune range partition by min/max by Yisaer · Pull Request #36610 · pingcap/tidb · GitHub, officially defined as a BUG.

translator_bot · June 23, 2024, 5:43am

| username: Jellybean | Original post link

Well, when is the optimization of this issue expected to be released? Is there a plan?

translator_bot · June 23, 2024, 5:43am

| username: 北京大爷 | Original post link

Before the optimization of partition pruning rules, it was not available.
Normally, it will be released in version 6.3, but we will strive to include it in version 6.1.1. It is expected to be released by mid-September.

translator_bot · June 23, 2024, 5:43am

| username: Jellybean | Original post link

Sure, thanks for the explanation.

translator_bot · June 23, 2024, 5:43am

| username: h5n1 | Original post link

Is there a query timeout limit for the coprocessor?

translator_bot · June 23, 2024, 5:43am

| username: Jellybean | Original post link

Another similar question: range分区表order by 索引列limit 1 长时间未返回 - TiDB 的问答社区

translator_bot · June 23, 2024, 5:43am

| username: 北京大爷 | Original post link

cop does not have a timeout, but there are execute time out session variables and hints that can control SQL timeout.

translator_bot · June 23, 2024, 5:43am

| username: h5n1 | Original post link

What does this error mean? What is the reason? Is it because the data volume is too large and the coprocessor thread crashed?

translator_bot · June 23, 2024, 5:43am

| username: Jellybean | Original post link

It should be that there are too many regions, causing a timeout when loading regions.

translator_bot · June 23, 2024, 5:43am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.