Consultation on the correlation between slow KV writes, KV memory growth, and PD elections over a period of time

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 请教一段时间内kv写入慢,kv内存增长,pd发生选举之间的关联

| username: Qiuchi

[TiDB Usage Environment] Production Environment
[TiDB Version] 6.5.0
[Resource Configuration] pd db mixed *3, kv *6
[Reproduction Path]
Continuously perform update operations on a large table for a period of time

[Encountered Problem: Phenomenon and Impact]

How can I analyze the correlation of the following phenomena?

The first issue noticed is that the update operation latency increases, and in the kv logs, there are logs indicating long prewrite/commit times.

Then, in the Grafana panel, it was observed that the memory of the kv nodes increased during this period.

Further checking the kv logs, it was found that during the same period, there was a PD election. kv logs:

PD logs:

| username: WalterWj | Original post link

Mixed deployment? I understand that there are resources contending with each other? How about trying Cgroup or numactl, or configuring CPU and memory for each component?

| username: Qiuchi | Original post link

For the mixed deployment node, I installed NUMA but did not configure it. During that period, the CPU usage of the node where PD is located was around 500%.

However, the strange thing is that the PD election occurred at the beginning of this period, around 2 o’clock in the graph. After the update was completed, the KV memory dropped, and no more elections occurred.

Additionally, for an update operation, if the PD TSO is slow, would it cause the KV node memory to increase and generate such logs?

| username: WalterWj | Original post link

PD leader election: 1. Slow PD disk write 2. High network latency. Elections generally do not occur, but if an unexpected election happens, it is most likely due to these two reasons. Essentially, it means that the other two PDs believe there is an issue with the current PD leader.

| username: WalterWj | Original post link

Yes, it is recommended to use better disks, such as NVMe, for the data path.

| username: Qiuchi | Original post link

I see that the reason for the re-election in the PD logs seems to be “fail to keepalive lease,” with the cause being “etcdserver: request timed out, waiting for the applied index took too long.” Do you have any insights on this?

I checked the IO utilization of the node exporter node’s data disk. The data disk of this node’s PD is on the system disk (green), and the DB cache path is on another disk (blue). During the problematic interval, the utilization rate of the system disk only reached about 20%, while the DB cache disk reached 80%. I’m not sure if this is related (both the system disk and data disk are NVMe SSDs, but their performance is indeed not great).

| username: WalterWj | Original post link

Your drive is probably not NVMe.

| username: Qiuchi | Original post link

I previously asked the operations team and they said they attached… If not, would the system disk usage be lower? Or in the case of the same 20% usage, has PD already affected its performance in my situation?

| username: WalterWj | Original post link

Test it with fio on the follow node: TiDB 安装部署常见问题 | PingCAP 文档中心

| username: Qiuchi | Original post link

I’ll try it later, thanks.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.