TiDB Sudden Jitter: SQL Performance Degradation

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb 突然抖动 sql变慢了

| username: TI表弟

[TiDB Usage Environment] prod
[TiDB Version] v7.5.0
[Reproduction Path] Operations performed that led to the issue
[Encountered Issue: Symptoms and Impact] Suddenly very slow and laggy, SQL execution time increased significantly, monitoring shows CPU, memory, and IO are all normal.
[Resource Configuration]

[Attachments: Screenshots/Logs/Monitoring]



| username: tidb狂热爱好者 | Original post link

It seems like your disk is not good, it failed at just 65MB.

| username: tidb狂热爱好者 | Original post link

TiKV is using 4 cores, TiDB is also using 4 cores, and the CPU is maxed out. This is your staging environment, right?

| username: 像风一样的男子 | Original post link

Why are the server resource configurations so inconsistent? It’s really bothering my OCD.

| username: lemonade010 | Original post link

The resources are fully utilized, right? The TSO wait time is already 1.02 seconds.

| username: TI表弟 | Original post link

Resources are not fully utilized.

| username: TI表弟 | Original post link

Only one TiKV is slightly lower.

| username: TI表弟 | Original post link

Bro, it’s not fully utilized.

| username: WalterWj | Original post link

Take a look at the top SQL around 1.50 and see.

| username: TI表弟 | Original post link

It feels like the IO suddenly dropped here, strange.

| username: TI表弟 | Original post link

The monitoring of TiDB is somewhat inconsistent.

| username: 有猫万事足 | Original post link

This isn’t even the most outrageous part; tso_wait is already at 1.02s. This is unbearably high. Normally, it should be under 100ms. With a second-level tso, based on my experience, pinging the PD leader might have a delay of over 100ms. Did the PD leader switch unexpectedly and cross subnets?

| username: lemonade010 | Original post link

I quite agree with the previous point. The TSO is already at 1.02 seconds, so there must be an issue with the PD. Also, check if your connect count suddenly increases when the load is high, meaning the concurrency is high and the number of query statements has increased.

| username: TIDB-Learner | Original post link

It looks like your resources, especially memory usage, are high.

| username: 迪迦奥特曼 | Original post link

Check the performance overview/read/write related monitoring.

| username: TiDBer_H5NdJb5Q | Original post link

Check the system monitoring to see if there are other services competing for resources.