Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 【TiDBer 唠嗑茶话会 65 】赢取 Ti 红露营五件套,分享常见 TiDB 错误 & 解决不费脑!
Ti Red Camping Gear Five-Piece Set Limited Time Drop in This Chat Session!! 
All TiDBers must be asking
: How can we get the Ti Red Camping Gear Five-Piece Set???
: Everyone, don’t rush, let the robot introduce this session’s theme first: TiDB Database Common Errors and Corresponding Solutions Sharing Conference.
(PS: If you are really in a hurry, you are allowed to jump to the end [Ti Red Camping Gear Limited Prize] to check the conditions for obtaining it)

When developing or operating with TiDB, we often encounter some errors or issues. Understanding these errors and their solutions can help us better manage the TiDB database and improve work efficiency. So, this session will be a TiDB Database Common Errors & Solutions Sharing Conference! Let’s share the paths we’ve walked with more friends~
For example
:
- “PD server timeout” error: This error usually indicates a timeout response from the PD Server (Placement Driver), which may be caused by network failures or high load on the PD cluster.
- Solutions include adding PD cluster nodes, optimizing PD cluster performance, checking the status of PD nodes, and fixing any downtime or network issues.
Another example
:
- “Error 1009: invalid time zone” error: This error usually indicates an incorrect time zone setting, which may be caused by issues with the operating system or TiDB configuration.
- Solutions include modifying the time zone settings and restarting the TiDB cluster.
So, TiDBers, show off your experiences this session~ PS: When sharing, please follow the two parts below~👇
Common Errors + Solutions
After the topic ends, the collection of shares from everyone will be made into [Community Wisdom Dry Goods]~
This Session’s Topic:
Come and share the common errors & solutions in the TiDB database~ (PS: Both parts are required)
Activity Rewards:
Ti Red Camping Gear Limited Prize
- The TiDBer who shares according to the format and receives the most likes💗 in this chat session will be rewarded with the Ti Red Camping Gear Five-Piece Set!
- The TiDBer who shares according to the format and contributes the most common TiDB database errors and corresponding solutions in this chat session will be rewarded with the Ti Red Camping Gear Five-Piece Set!
Participation Award:
TiDBers who share according to the [Common Errors + Solutions] format will receive 30 points reward~
Activity Time:
2023.3.31-2023.4.7
Ti Red Camping Gear Five-Piece Set Display
[Common Errors]
TiKV memory usage surges and does not recede
[Solutions]
- Check if the TiKV version is the official compiled version. If not, refer to the Linux memory allocation scheme (incorrect scheme selection may lead to memory not being released in a timely manner).
- Insufficient disk IO, slow write speed, causing memory write to be faster than disk flush. Enable flow control to ensure stable memory and disk flush…
- Slow queries push down and occupy resources for a long time, unable to release in a timely manner. Enable resource location to find these slow queries, optimize them, and reduce long-term resource occupation.
[Common Errors]
The query uses a hint but still does not use TiFlash, resulting in slow query performance.
However, after manually setting the SESSION to TiFlash, the query speed is very fast.
[Solution]
Add a hint to the subquery as well;
The table’s statistics might be inaccurate. You can manually use ANALYZE TABLE
to collect statistics, which might allow the query to use TiFlash by default without adding a hint.
Error accessing PD: TiKV cluster is not bootstrapped
Most of PD’s APIs can only be used after the TiKV cluster has been initialized. If you only start PD when deploying a new cluster and have not yet started TiKV, you will encounter this error when accessing PD. To resolve this error, you should first start the TiKV that you want to deploy. TiKV will automatically complete the initialization process, after which you can access PD normally.
Is there a quick way to resolve CrashLoopBackOff?
Yes, during the initialization of the entire cluster, you need to start PD first, then TiKV, and finally TiDB. The relevant data persistence needs to be stored in TiKV. Once TiKV is started normally, the entire service can use PD normally.
Check the StorageClass (SC) bound to the Persistent Volume (PV). When deleting, you can set the volume type to “delete”. This way, the data inside will be automatically deleted when unbinding.
【Common Error】
A common error during scaling down: In a 3-node TiKV cluster, if one node goes down and cannot be brought back up, the scaling down process remains in an offline state and cannot proceed normally.
【Reason】
To ensure three replicas, having only 2 TiKV nodes will prevent region replicas from being scheduled, thus making it impossible to scale down normally.
【Solution】
First, scale up by adding a node to ensure the cluster meets the condition of having at least 3 nodes. Once the region replicas can be scheduled normally, proceed with scaling down the TiKV.
[Common Errors]
Excessive TiKV metrics leading to huge Prometheus storage and multiple Prometheus restarts
[Cause]
Excessive TiKV metrics
[Solution]
Add the following lines to the Prometheus configuration file under job: tikv
metric_relabel_configs:
* source_labels: [**name**]
separator: ;
regex: tikv_thread_nonvoluntary_context_switches|tikv_thread_voluntary_context_switches|tikv_threads_io_bytes_total
action: drop
* source_labels: [**name**,name]
separator: ;
regex: tikv_thread_cpu_seconds_total;(tokio|rocksdb).+
action: drop
However, the reason why TiKV has so many metrics still needs to be identified by the official team. Refer to the old post: tikv状态接口输出metric过多,请问如何优化呢? - TiDB 的问答社区
Frequent crashes are not a database issue, but rather events where database anomalies are caused by misconfigurations in the operating system, network, or application side!
Drainer component synchronization delay: This issue is caused by an excessive business load that the drainer process cannot handle.
Solution: Split the drainer and scale out the drainer. Use the parameter syncer.replicate-do-db to synchronize different databases separately.
Config:
syncer.replicate-do-db
- database1, database2, database3
When executing SQL statements, an error occurs: 1105 - Out Of Memory Quota
Solution: You can set tidb_mem_quota_query
to a sufficiently large value at the session level, for example:
set tidb_mem_quota_query=8589934592;