Slow Log build_task_duration 2 seconds

translator_bot · June 20, 2024, 2:17pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 慢日志 build_task_duration 2秒

| username: TiDBer_i3pbMJ65

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 7.5.1
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Problem Phenomenon and Impact]
The cluster has very low pressure, almost no pressure. There are 3 TiKV machines, each with 4 TiKV processes, 4 SSD disks, 500G memory, and 56 physical cores. The cluster has 1.5T of data. A single scan query for 1000 rows of data took 2 seconds, and the second query took 70ms.

[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

translator_bot · June 20, 2024, 2:17pm

| username: TiDBer_i3pbMJ65 | Original post link

What does build_task_duration mean?
The slow log shows that pd_total takes a lot of time:

translator_bot · June 20, 2024, 2:17pm

| username: 鱼跃龙门 | Original post link

PD_total: Indicates the total time spent on all RPC requests on PD for this statement. The second time querying some information will be stored in the TiDB cache, so it will be faster.

translator_bot · June 20, 2024, 2:17pm

| username: FutureDB | Original post link

PD_total: Indicates the total time spent on all RPC requests on PD for this statement.

If PD_total takes a long time, you can check whether it is due to slow network transmission with PD.

translator_bot · June 20, 2024, 2:17pm

| username: FutureDB | Original post link

build_task_duration: Indicates the duration of the build task, i.e., the time taken to execute the build task. This time usually includes various operations required to execute the build plan, such as scanning tables, building indexes, preparing data, etc.

translator_bot · June 20, 2024, 2:17pm

| username: TiDBer_i3pbMJ65 | Original post link

TiKV cache, right? There shouldn’t be any difference in PD_total between the first and second queries; the difference is just that the second query hits the TiKV memory. From the execution plan, the block hit difference between the first and second queries is not significant.

translator_bot · June 20, 2024, 2:17pm

| username: Jasper | Original post link

Can this SQL still be reproduced now? How about checking the resource usage of PD at the corresponding time?

translator_bot · June 20, 2024, 2:17pm

| username: ziptoam | Original post link

The second query should hit the cache, so it’s relatively fast.

translator_bot · June 20, 2024, 2:17pm

| username: zhaokede | Original post link

PD is very busy or TiDB, the bandwidth between PDs is insufficient, otherwise the time spent on building or processing tasks would not need to be so long.

translator_bot · June 20, 2024, 2:17pm

| username: h5n1 | Original post link

How many regions does this table have?

translator_bot · June 20, 2024, 2:17pm

| username: Billmay表妹 | Original post link

[Resource Allocation] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

Let’s take a look at your deployment situation~

translator_bot · August 26, 2024, 6:04am

| username: Hacker_zuGnSsfP | Original post link

If it’s hot data, try adding a cache to see if it solves the problem.