How to Troubleshoot Significant Differences in SQL Execution Between TiDB Dashboard and Database Management Tools

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb dashboard 和数据库管理断工具执行SQL相差太大如何排查

| username: TiDBer_E3pRgGAy

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version] 6.6.0
[Encountered Problem: Problem Phenomenon and Impact]
The same SQL executes quickly in the management client, but it takes a long time in TiDB Dashboard’s slow SQL.



[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachments: Screenshots/Logs/Monitoring]

| username: tidb菜鸟一只 | Original post link

Go to the dashboard page, click on SQL, and check which step is slow.

| username: Kongdom | Original post link

Looking at the timeline, is there a significant concurrency or resource shortage?

| username: redgame | Original post link

A bit of latency.

| username: 像风一样的男子 | Original post link

Was the cluster highly concurrent at that time? What was the overall system latency?

| username: 啦啦啦啦啦 | Original post link

The slow logs in the dashboard are also taken from the slow log files, so they are accurate. If the client execution is fast, it only means that it is not slow now. You can check if there were any resource bottlenecks at that time by looking at the monitoring data, and you can also click into the details to see where the time was spent.

| username: TiDBer_E3pRgGAy | Original post link

No, this SQL is executed very frequently, and almost every moment it is displayed as slow SQL in the console. However, when I execute it on the client at the same time, it is not slow. Here we are using TiFlash, a cloud-based system with ESSD disks.

| username: TiDBer_E3pRgGAy | Original post link

It’s also slow when the cluster concurrency is low.

| username: 像风一样的男子 | Original post link

The performance of ESSD in the cloud is not very good, you need to use local disks.

| username: TiDBer_E3pRgGAy | Original post link

However, it’s a bit strange. The number of this table is not that large, and I have used TiFlash before without encountering this situation. I observed a phenomenon where a port in TiFlash has a consistently high CPU usage.

| username: 像风一样的男子 | Original post link

TiFlash is used for complex queries in OLAP scenarios, not for handling such high concurrency queries. For your high-frequency queries, it’s better to use TiKV.

| username: TiDBer_E3pRgGAy | Original post link

Sure
Do you know why the CPU usage is so high here?


Also, I found that the limit usage rate inside almost every one is super high.

| username: zhanggame1 | Original post link

I guess it’s fluctuating, hard to measure.

| username: RenlySir | Original post link

The red line represents the limit, not the actual CPU usage. The 220% below indicates the actual usage.

| username: RenlySir | Original post link

As the teacher mentioned above, go to the dashboard and check the SQL to see which step is slow.

| username: TiDBer_E3pRgGAy | Original post link

If I remove this table from TiFlash and directly use TiKV, it won’t have this issue. I guess there’s something wrong with my TiFlash component.

| username: knull | Original post link

Is it possible that the execution plan went wrong, and it went to TiFlash when it should have gone to TiKV?