Issues with v7.1.1: Upgrading to 7.1.2 did not resolve the TiDB Dashboard SQL page anomaly

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: v 7.1.1 的问题,升级 7.1.2 未能解决 TiDB Dashboard SQL 页面异常问题

| username: rebelsre

[TiDB Usage Environment] Production Environment
[TiDB Version] v7.1.2
[Reproduction Path]
[Encountered Problem: Problem Phenomenon and Impact]
TiDB Dashboard SQL statement analysis and slow query page error

  • Error 1105 (HY000): runtime error: invalid memory address or nil pointer dereference
  • common.bad_request
    Note: Not a version bug, as it appeared after a long time of use. Restarting did not solve the issue, so upgraded to 7.1.2, but the error persists.
    [Resource Configuration]
    [Attachments: Screenshots/Logs/Monitoring]


| username: Fly-bird | Original post link

Take a look at the logs.

| username: rebelsre | Original post link

Refreshing the page in PD will generate these types of logs.

| username: Miracle | Original post link

I remember someone solved it by restarting PD before…

| username: rebelsre | Original post link

Restarting/migrating PD has no effect.
Upgrading to 7.1.2 is also useless.

| username: 芮芮是产品 | Original post link

After you delete the slow logs, restart TiDB.

| username: tidb菜鸟一只 | Original post link

Check the slow-query directory under your TiDB host directory. If there are many logs, delete some slow SQL logs.

| username: rebelsre | Original post link

Directly delete these log files?

| username: rebelsre | Original post link

Configured to retain for 30 days, currently seeing 70+ copies per machine
tidb-deploy/tidb-4000/log/tidb_slow_query*.log

| username: tidb菜鸟一只 | Original post link

There are so many slow logs… How about adjusting the threshold for slow logs? Or don’t you optimize slow SQL queries? Delete all the historical ones first, and the dashboard should stop reporting errors.

| username: rebelsre | Original post link

Deleted those slow query log files and restarted tidb-server, but the same error still occurs.

| username: rebelsre | Original post link

Deleted the slow log file and restarted the tidb-server, but it’s still the same.

| username: 有猫万事足 | Original post link

Based on the stack you provided, I looked through the code. It feels very confusing.
From my limited experience, the top of this stack is where the null pointer references the object.

The error at line 1098 makes me think that this worker object is null for some reason. This conclusion feels very unreliable. :joy: I’m not good at this.

If the SQL select * from INFORMATION_SCHEMA.CLUSTER_STATEMENTS_SUMMARY_HISTORY limit 1; consistently reports an error, I suggest directly filing a bug on GitHub.

However, besides the error stack, it is also recommended to provide the execution context of the SQL for better troubleshooting.

Execute the following statement:
PLAN REPLAYER DUMP EXPLAIN select * from INFORMATION_SCHEMA.CLUSTER_STATEMENTS_SUMMARY_HISTORY limit 1;
Then go to each TiDB instance to find the name of the returned zip file. Upload this zip file to the issue as well. This will make it easier for PingCAP developers to troubleshoot the problem.
The zip file will not contain user data. You can open it and check if you are concerned.

| username: rebelsre | Original post link

replayer_lpj0ZXzLWwOSvnGNp9b-mQ==_1698649890339077634.zip|attachment (19.6 KB)

| username: rebelsre | Original post link

It looks like someone has already raised an issue.
panic:runtime error: invalid memory address or nil pointer · Issue #47531 · pingcap/tidb (github.com)

| username: 像风一样的男子 | Original post link

It feels like there might be an issue with the data source. You can try uninstalling and reinstalling the monitoring modules Prometheus and Grafana to see if that helps.

| username: TiDBer_小阿飞 | Original post link

The error message says “invalid memory address,” so maybe we should look at it from the memory perspective. Check the system memory, then check the tidb_mem_quota_query memory. Also, check the tidb_enable_prepared_plan_cache to see if the Prepared Plan Cache is enabled. It is enabled by default, so try disabling it and restarting to troubleshoot the issue.

| username: rebelsre | Original post link

Tried it, didn’t work…

| username: rebelsre | Original post link

The current default value for tidb_mem_quota_query is 1G, but I tried changing it to 32G and still got the same error.
Disabling tidb_enable_prepared_plan_cache also resulted in the same error, and after restarting the tidb-server, the value of tidb_enable_prepared_plan_cache was reset.

| username: TiDBer_小阿飞 | Original post link

Shouldn’t you also restart PD?