Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tikv wait_time.snapshot耗时很高
[TiDB Usage Environment] Production Environment
[TiDB Version] v4.0.14
[Encountered Problem: Phenomenon and Impact]
-
A large number of slow SQL queries
-
Checking the TiDB logs shows a large number of “invalidate current region, because others failed on same store” logs, pointing to the same TiKV node
-
Checking the TiKV node logs, all point to the same region
-
Tidashboard hotspot map
[Resource Configuration] TiKV 9c/90g * 7
Here are some practices, see if they can help you.
Click on SQL in the dashboard to enter the SQL details page and see where the statement is slow.
Prewrite and lock, these all seem to be affected.
The TiDB server logs point to a specific TiKV node in the backend.
Here are the logs for that TiKV node.
Issues found from the logs:
-
Timestamp too slow. A large number of transaction start and end timestamps are being fetched from PD. This could be caused by a large number of update statements. Consider batching a certain number of updates and committing multiple updates in a single transaction.
-
Frequent leader elections. This could be due to the system being too busy to send heartbeats. In cases of network latency or poor conditions, consider increasing the election timeout to reduce the impact of no leader.
-
Frequent data application to the RocksDB KV store. Submissions are very frequent.
Summary: From a business perspective, investigate why there are so many updates. Allow multiple update statements to be committed at once.
Yes, this is controlled by the business, and it’s not easy to update in batches. Moreover, from the perspective of database monitoring and business at that time, there weren’t many updates.
In other words: there are no large-scale update operations at all?? Are there any other DML operations?
A large number of slow SQLs. What is the reason for the slowness of these update statements? Please share your analysis results for us to review.
The slowness during the update is all in the prewrite phase. Let’s upgrade first and see.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.