Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: execute sql panic tidb load mysql.user fail out of memory

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.1.2
[Reproduction Path] What operations were performed that caused the issue.
Last night, we made some parameter modifications. There are currently two operations, and we are not sure which one caused the issue. We executed the following on the MySQL client: set GLOBAL tidb_mem_quota_query = 10240000;
This size is around 9MB. The reason for this is that we initially set a larger value, but the query results did not change each time. So, we modified it multiple times and eventually removed a few zeros.
After noticing no effect, we went to the configuration parameters and modified two parameters: tidb_mem_quota_query
and performance.txn-total-size-limit
. After a rolling update, we found that TiDB could not start and kept reporting errors.
Upon further investigation, we found that the performance.txn-total-size-limit
was set smaller than tidb_mem_quota_query
.
However, during our operations, we had two PDs, two TiDBs, and three TiKVs. One server with IP 71 had both PD and TiDB installed. During the rolling update, it did not update successfully. At this time, another TiDB client with IP 70 had already crashed, but we could still connect to TiDB through IP 71, although it would throw a memory overflow error.
After restarting the TiDB cluster, we encountered an issue where PD could not elect a leader. We tried many methods but could not resolve it. We restored the entire cluster, except for the TiKV data disk, to the snapshot from 2 AM on the 25th. All other disks, including the TiUP machine, were restored.
[Encountered Issue: Symptoms and Impact] Currently, the TiPD node cannot start, while other nodes start normally. My guess is that either the system table is corrupted or the parameters were solidified and stored in TiKV. We are unsure how to fix this, and the production environment is currently down. Please help us.
[Resource Configuration]
[Attachments: Screenshots/Logs/Monitoring]