TopSQL and SQL Analysis Discrepancies in the Dashboard

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Dashboard中TopSQL与SQL分析不一致

| username: EricSong

【TiDB Usage Environment】Testing
【TiDB Version】v6.1.0
【Reproduction Path】None
【Encountered Problem: Problem Phenomenon and Impact】
The current TiDB node has high CPU and memory usage. During troubleshooting, it was found that multiple SQLs in TopSQL consumed a large amount of CPU. However, when searching for these SQLs using the SQL template ID as per the official documentation, they could not be found, and both call/s and latency/call in TopSQL are 0.
【Attachments: Screenshots/Logs/Monitoring】


| username: tidb狂热爱好者 | Original post link

First, reduce the largest SQL. When TiDB is completely stuck, there are situations where you can’t see the SQL details.

| username: tidb狂热爱好者 | Original post link

View slow query statements through the SQL command window

Search for Top N slow queries

select query_time, query
from information_schema.slow_query
where is_internal = false -- Exclude internal TiDB slow query SQL
order by query_time desc
limit 2;

Time unit is seconds

This is the insert statement executed during previous test data, judged as a slow query by TiDB because there is no index.

Search for Top N slow queries of a specific user

select query_time, query, user
from information_schema.slow_query
where is_internal = false -- Exclude internal TiDB slow query SQL
and user = "test" -- The username to search for
order by query_time desc
limit 2;

Search for similar slow queries based on SQL fingerprint

First, get the digest value from the top N query, then query the fingerprint value.

select query_time, query, digest
from information_schema.slow_query
where is_internal = false
order by query_time desc
limit 1;

select query, query_time
from information_schema.slow_query
where digest = "31a61be6ba3e5a51b26372ba82e6cdc7b517e73630af5937ad7b607500375aac";

Search for slow query SQL statements with pseudo statistics

select query, query_time, stats
from information_schema.slow_query
where is_internal = false
and stats like %pseudo%;

These SQL statements can also be used for troubleshooting.

| username: redgame | Original post link

The situation where everything is 0 means it’s stuck.

| username: EricSong | Original post link

Thank you for your reply, but I found that the SQL template IDs for these SQLs cannot be found in SlowQuery, and I cannot obtain valid information from information_schema.slow_query.
Is this because the execution speed of this SQL is not slow and therefore not recorded in SlowQuery, or is it stuck for some reason and not recorded in SlowQuery?

| username: Kongdom | Original post link

Is the time range incorrect? Slow queries are recorded based on the time when the SQL execution ends. SQL statements that are still executing will not appear in the slow query log.

| username: h5n1 | Original post link

What is the execution frequency of the top SQL you are looking for? If it executes quickly, slow SQL will not be recorded. SQL analysis comes from those tables in statement_summary, and the content inside will be removed over time.

| username: EricSong | Original post link

I feel that these SQL statements seem to be executing continuously. Because they are always executing, they do not enter SlowQuery, and both call/s and latency/call are 0. However, looking at the Grafana panel, this situation has been ongoing for more than a month. I think it is impossible for SQL to execute for a month because there are maxTime-like restrictions. Currently, I have restarted 3 TiDB nodes, and all these TopSQLs have disappeared. It indeed looks like several SQL statements were continuously executing.

| username: EricSong | Original post link

I can’t see the execution frequency information because the template IDs of these SQLs cannot be found in SQL statement analysis, SlowQuery, and logs. Therefore, I only know that these TopSQLs consumed a lot of CPU, but I can’t find the frequency, time, or user.

| username: h5n1 | Original post link

How about checking these two tables: cluster_statements_summary and cluster_statements_summary_history?

| username: EricSong | Original post link

No results found when querying both tables

| username: h5n1 | Original post link

How about trying to match with PLAN_DIGEST?

| username: EricSong | Original post link

I still can’t find it, it’s as if these SQL queries with high CPU usage never existed.

| username: h5n1 | Original post link

It is probably a bug.