Both TiDB and the business system report PD server timeout errors

translator_bot · June 22, 2024, 4:23pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb和业务系统都报PD server timeout错误

| username: TiDBer_vC2lhR9G

[TiDB Usage Environment] Production Environment / Testing / Poc
[TiDB Version]
[Encountered Issues: Problem Phenomenon and Impact]

https://asktug.com/t/topic/1002300/4?_gl=11hwm3b1_gaMzc2NjY1NTg3LjE2NTI1MjM2OTU._ga_5FQSB5GH7F*MTY3NzQwMzUwMC4yOS4wLjE2Nzc0MDM1MDAuMC4wLjA.

The previous PD failure should be fixed, using the pd-recover recovery method, but after the business started, the tidb log reported many PD server timeouts, and the business system also reported it. I checked the forum and it said it might be due to high PD pressure or disk IO issues. I checked the monitoring and felt that the pressure was not high. Could the experts help me see what the situation is? Thank you.

[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

translator_bot · June 22, 2024, 4:23pm

| username: tidb菜鸟一只 | Original post link

Check the PD logs for any errors.
To troubleshoot PD IO issues, you can use Grafana monitoring → Disk Performance → Latency and Load metrics.
Additionally, investigate network-related conditions. Use Grafana monitoring → blackbox_exporter → Ping Latency to determine if the network between TiDB and the PD leader is normal.

translator_bot · June 22, 2024, 4:23pm

| username: xfworld | Original post link

Check the PD TSO related metrics to know. If it is below the system requirement value, you should consider upgrading the configuration.

Check if this parameter is enabled:

translator_bot · June 22, 2024, 4:23pm

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.