Prewrite Takes a Significant Amount of Time in Slow Queries

translator_bot · June 22, 2024, 2:14am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 慢查询中prewrite占用时间巨大

| username: Miracle

[TiDB Usage Environment] Production Environment
[TiDB Version] V5.2.1
[Reproduction Path] What operations were performed when the issue occurred

[Encountered Issue: Problem Phenomenon and Impact]
There is a scheduled task at 2 AM every day to delete data from the table, but it has been failing for the past two days. I checked the slow log, and this delete operation took more than 4 hours. Most of the time was spent on prewrite. Since this table has a scheduled insert operation approximately every 10 seconds, I initially thought this was the cause. However, even after stopping the insert operation, the issue persisted. I’m confused and would like to ask what might be causing this problem?

[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]
Slow log: To fit it into one screenshot, I deleted about 40 lines of Backoff_types because they were too many and repetitive.

TiKV related logs

Slow log

translator_bot · June 22, 2024, 2:14am

| username: MrSylar | Original post link

Don’t delete data like this. When a single transaction is too large, prewrite will definitely be slow. Refer to some suggestions from the official documentation.

Delete Data | PingCAP Documentation Center

translator_bot · June 22, 2024, 2:14am

| username: tidb菜鸟一只 | Original post link

Are you using the pessimistic transaction model or the optimistic transaction model? Additionally, I suggest deleting data in a loop with a limit of 10,000.

translator_bot · June 22, 2024, 2:14am

| username: 裤衩儿飞上天 | Original post link

Is the amount of deleted data relatively large?

translator_bot · June 22, 2024, 2:14am

| username: h5n1 | Original post link

Take a look at the overview during that time, the leader monitoring of TiKV, and the network latency of blackexporter.

translator_bot · June 22, 2024, 2:14am

| username: Miracle | Original post link

The default pessimistic transaction model is scheduled to be deleted daily. It has been deleted for more than a year, and there were no issues before.

translator_bot · June 22, 2024, 2:14am

| username: Miracle | Original post link

Previously, about 30,000 rows of data were deleted, but the recent failures should involve even more than that.

translator_bot · June 22, 2024, 2:14am

| username: Miracle | Original post link

The expression delta(tikv_raftstore_region_count{type=“leader”}[30s]) < -10 has no data.
I don’t seem to have found blackexporter.

translator_bot · June 22, 2024, 2:14am

| username: h5n1 | Original post link

Take a look at the logs for this redlined TiKV. Was it restarted around 2:00?

translator_bot · June 22, 2024, 2:14am

| username: Miracle | Original post link

There has been no restart.

translator_bot · June 22, 2024, 2:14am

| username: h5n1 | Original post link

According to this, export the monitoring data for one hour before and after 2 o’clock: including overview, tidb, and tikv-detail. Make sure to expand all the panels and wait for the data to load completely before exporting, otherwise there will be no data.
https://metricstool.pingcap.net/#backup-with-dev-tools

translator_bot · June 22, 2024, 2:14am

| username: Miracle | Original post link

Uh, this environment only has one Prometheus available, no Grafana, and it can’t be deployed…

translator_bot · June 22, 2024, 2:14am

| username: h5n1 | Original post link

Deploy one

translator_bot · June 22, 2024, 2:14am

| username: Miracle | Original post link

Restored to normal after rebuilding the table.

translator_bot · June 22, 2024, 2:14am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.