How to Configure Slow SQL Email Alerts in TiDB's Grafana?

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb的granfana如何配置慢SQL邮件告警?

| username: hacker_77powerful

[TiDB Usage Environment] Production Environment
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
[Attachment: Screenshot/Logs/Monitoring]

The production system now needs to send email alerts for slow SQL queries in TiDB that exceed 10 seconds. How can this be achieved?

| username: DBAER | Original post link

It seems that for top SQL, you can only write your own program logic, or write an export to cover it or use ELK. There doesn’t seem to be a ready-to-use solution.

| username: Jellybean | Original post link

One approach is to increase the slow query threshold of the cluster to 10 seconds, then retrieve the slow queries from Prometheus, and configure Alertmanager to send the corresponding alerts. This is a relatively simple method that maximizes the use of existing component functionalities.

Another approach is to retain the original slow query threshold, collect all slow query logs from the cluster (either by real-time collection of all log files or by reading the system tables that have already been summarized by the system), filter out the SQL queries that exceed 10 seconds, and then import them into the alert platform to send out the notifications. This solution requires a higher level of technical stack and capability.

| username: 小龙虾爱大龙虾 | Original post link

If you don’t want SQL text, the existing logic should be fine. For slow SQL situations, there should be histograms in Prometheus. After an alert, you can find the slow SQL yourself. Another way is to use your own monitoring platform to query slow SQL through SQL statements, but this is not recommended because this kind of monitoring SQL itself consumes a lot of resources. PS: You can’t put everything into monitoring, right?

| username: TiDBer_QYr0vohO | Original post link

I feel like I can implement it myself through a script, the logic is relatively clear and simple.

| username: zhaokede | Original post link

Indeed, write a scheduled task with code.

| username: stephanie | Original post link

You can modify the configuration in the alertmanager configuration file. I see that the default alertmanager installed with tiup is all commented out.

| username: 像风一样的男子 | Original post link

You can write an SQL query yourself, similar to select * from INFORMATION_SCHEMA.CLUSTER_SLOW_QUERY where Query_time > 10, and then periodically execute a shell script through a scheduled task to send email alerts.

| username: yulei7633 | Original post link

I wrote my own monitoring program using Python, storing the necessary data into the database, and then displaying it on a webpage. The built-in tools are mainly for troubleshooting issues.

| username: xiaoqiao | Original post link

Custom alter manager

| username: QH琉璃 | Original post link

alter manager

| username: 我是人间不清醒 | Original post link
This open-source tool can kill slow queries and send alerts.