Memory Leak of PD Leader Reaches 28G in TiDB Version 6.5.0

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TIDB6.5.0版本出现pd leader内存泄露达到28G

| username: TiDBer_G64jJ9u8

[TiDB Usage Environment] Testing/PoC, deployed in EasyStack cloud environment
[TiDB Version] 6.5.0
[Reproduction Path] Normal K8S deployment of this version
[Encountered Issue: Problem Phenomenon and Impact]
After system deployment, the memory of the pd-server master node inexplicably continues to rise. When it reaches 28G, pd fails and is automatically killed, then a new master is elected. The memory of the new master node continues to rise. Currently, the system’s test workload is relatively small.

By checking the logs of the faulty pd, there are not many suspicious anomalies. The only suspicious point in the entire system is the high IO, with the system wait value reaching 5. However, this situation did not occur during previous stress tests, even with similarly high system IO.

[Resource Configuration] 64G 32-thread CPU
[Attachments: Screenshots/Logs/Monitoring]

| username: xfworld | Original post link

You can check the IO monitoring metrics through Grafana, especially the related metrics of PD.

If the IO is insufficient, running TiDB will be quite challenging.

| username: Kongdom | Original post link

Have you tested the disk with fio?

| username: TiDBer_G64jJ9u8 | Original post link

Large-scale PD writes are definitely abnormal, and this is not necessarily related to high I/O. What exactly is PD writing, and under what circumstances would these writes be triggered?

| username: 秋枫之舞 | Original post link

You can collect the PD heap to see what is consuming it.

curl http://xxx.xxx.xxx.xxx:2379/debug/pprof/heap?seconds=60 >pd_heap
| username: Billmay表妹 | Original post link

Are you running all the components on a single machine?

| username: h5n1 | Original post link

I see two fixes:

  • Fixed the issue where calling ReportMinResolvedTS too frequently caused PD OOM #5965
  • Fixed the issue where using Prepare or Execute to query certain virtual tables could not push down the table ID, leading to PD OOM in the case of a large number of Regions #39605
| username: TiDBer_G64jJ9u8 | Original post link

Does a virtual table refer to a view type or a partitioned table?

| username: tidb菜鸟一只 | Original post link

Is the TiDB deployed on k8s using local disks? I see the PD data directory is pointing to /var/lib/pd? How large is your cluster? Normally, the IO pressure on PD shouldn’t be that high.

| username: TiDBer_G64jJ9u8 | Original post link

I have deployed three nodes, each node is reused. All three nodes are in a hyper-converged cloud environment. It’s a test environment with low write pressure. Previously, during testing, the write pressure was high, and there was no such issue with PD. The PD data directory points to /var/lib/pd because the k8s container deployment mode is used.

| username: Miracle | Original post link

Why are there so many PD-servers? Have several sets been deployed?

| username: andone | Original post link

Is there any mixed deployment with TiKV or TiDB?

| username: Jellybean | Original post link

Why are there so many PD nodes? Are they all part of the same cluster?

| username: TiDBer_G64jJ9u8 | Original post link

These are the various threads under PD, didn’t anyone notice? Multiple PD threads are writing to the disk extensively. Take a look at the stack traces of these threads:

Thread 8 (LWP 14156):
#0 runtime.futex () at /usr/local/go/src/runtime/sys_linux_amd64.s:560
#1 0x0000000000decb96 in runtime.futexsleep (addr=0xfffffffffffffe00, val=0, ns=14845859) at /usr/local/go/src/runtime/os_linux.go:69
#2 0x0000000000dc2de7 in runtime.notesleep (n=0xc000316148) at /usr/local/go/src/runtime/lock_futex.go:160
#3 0x0000000000df78ac in runtime.mPark () at /usr/local/go/src/runtime/proc.go:2247
#4 runtime.stopm () at /usr/local/go/src/runtime/proc.go:2247
#5 0x0000000000df8f48 in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at /usr/local/go/src/runtime/proc.go:2874
#6 0x0000000000df9d7e in runtime.schedule () at /usr/local/go/src/runtime/proc.go:3214
#7 0x0000000000dfa2ad in runtime.park_m (gp=0xc00096da00) at /usr/local/go/src/runtime/proc.go:3363
#8 0x0000000000e24663 in runtime.mcall () at /usr/local/go/src/runtime/asm_amd64.s:448
#9 0x0000000000000000 in ?? ()
Thread 7 (LWP 14155):
#0 runtime.futex () at /usr/local/go/src/runtime/sys_linux_amd64.s:560
#1 0x0000000000decb96 in runtime.futexsleep (addr=0xfffffffffffffe00, val=0, ns=14845859) at /usr/local/go/src/runtime/os_linux.go:69
#2 0x0000000000dc2de7 in runtime.notesleep (n=0xc000100548) at /usr/local/go/src/runtime/lock_futex.go:160
#3 0x0000000000df78ac in runtime.mPark () at /usr/local/go/src/runtime/proc.go:2247
#4 runtime.stopm () at /usr/local/go/src/runtime/proc.go:2247
#5 0x0000000000df8f48 in runtime.findRunnable (gp=, inheritTime=, tryWakeP=) at /usr/local/go/src/runtime/proc.go:2874
#6 0x0000000000df9d7e in runtime.schedule () at /usr/local/go/src/runtime/proc.go:3214
#7 0x0000000000dfa2ad in runtime.park_m (gp=0xc00096da00) at /usr/local/go/src/runtime/proc.go:3363
#8 0x0000000000e24663 in runtime.mcall () at /usr/local/go/src/runtime/asm_amd64.s:448
#9 0x0000000000000000 in ?? ()

| username: zxgaa | Original post link

Let’s see what tasks the current TiDB cluster is executing.

| username: Jellybean | Original post link

Have you collected the heap from PD as mentioned by the expert above?
Also, confirm if there are any abnormal log messages in the cluster monitoring and other tidb-server/tikv-server nodes.
Check if there are any different business accesses to the cluster.

The information provided so far is not sufficient to pinpoint the issue, please continue troubleshooting.

| username: TiDBer_G64jJ9u8 | Original post link

I haven’t looked at this yet, currently looking at the data dumped using gdb, it’s too large! Here are some examples:
peer.mycluster
pu.svc.clu
vel":“info”,“log-file”:“”,“log-format”:“text”,“log-rotation-timespan”:“0s”,“log-rotation-size”:“300MiB”,“slow-log-file”:“”,“slow-log-threshold”:“1s”,“abort-on-panic”:false,“memory-usage-limit”:“5000MiB”,“memory-usage-high-water”:0.9,“log”:{“level”:“info”,“format”:“text”,“enable-timestamp”:true,“file”:{“filename”:“”,“max-size”:300,“max-days”:0,“max-backups”:0}},“quota”:{“foreground-cpu-time”:0,“foreground-write-bandwidth”:“0KiB”,“foreground-read-bandwidth”:“0KiB”,“max-delay-duration”:“500ms”,“background-cpu-time”:0,“background-write-bandwidth”:“0KiB”,“background-read-bandwidth”:“0KiB”,“enable-auto-tune”:false},“readpool”:{“unified”:{“min-thread-count”:1,“max-thread-count”:10,“stack-size”:“10MiB”,“max-tasks-per-worker”:2000,“auto-adjust-pool-size”:false},“storage”:{“use-unified-pool”:true,“high-concurrency”:8,“normal-concurrency”:8,“low-concurrency”:8,“max-tasks-per-worker-high”:2000,“max-tasks-per-worker-normal”:2000,“max-tasks-per-worker-low”:2000,“stack-size”:“10MiB”},“coprocessor”:{“use-unified-pool”:true,“high-concurrency”:12,“normal-concurrency”:12,“low-concurrency”:12,“max-tasks-per-worker-high”:2000,“max-tasks-per-worker-normal”:2000,“max-tasks-per-worker-low”:2000,“stack-size”:“10MiB”}},“server”:{“addr”:“0.0.0.0:20160”,“advertise-addr”:“basic-tikv-2.basic-tikv-peer.mycluster.svc:20160”,“status-addr”:“0.0.0.0:20180”,“advertise-status-addr”:“”,“status-thread-pool-size”:1,“max-grpc-send-msg-len”:10485760,“raft-client-grpc-send-msg-buffer”:524288,“raft-client-queue-size”:8192,“raft-msg-max-batch-size”:128,“grpc-compression-type”:“none”,“grpc-gzip-compression-level”:2,“grpc-min-message-size-to-compress”:4096,“grpc-concurrency”:5,“grpc-concurrent-stream”:1024,“grpc-raft-conn-num”:1,“grpc-memory-pool-quota”:“9223372036854775807B”,“grpc-stream-initial-window-size”:“2MiB”,“grpc-keepalive-time”:“10s”,“grpc-keepalive-timeout”:“3s”,“concurrent-send-snap-limit”:32,“concurrent-recv-snap-limit”:32,“end-point-recursion-limit”:1000,“end-point-stream-channel-size”:8,“end-point-batch-row-limit”:64,“end-point-stream-batch-row-limit”:128,“end-point-enable-batch-if-possible”:true,“end-point-request-max-handle-duration”:“1m”,“end-point-max-concurrency”:16,“end-point-perf-level”:0,“snap-max-write-bytes-per-sec”:“100MiB”,“snap-max-total-size”:“0KiB”,“stats-concurrency”:1,“heavy-load-threshold”:75,“heavy-load-wait-duration”:null,“enable-request-batch”:true,“background-thread-count”:2,“end-point-slow-log-threshold”:“1s”,“forward-max-connections-per-address”:4,“reject-messages-on-memory-ratio”:0.2,“simplify-metrics”:false,“labels”:{}},“storage”:{“data-dir”:“/var/lib/tikv”,“gc-ratio-threshold”:1.1,“max-key-size”:8192,“scheduler-concurrency”:524288,“scheduler-worker-pool-size”:8,“scheduler-pending-write-threshold”:“100MiB”,“reserve-space”:“0KiB”,“reserve-raft-space”:“1GiB”,“enable-async-apply-prewrite”:false,“api-version”:1,“enable-ttl”:false,“background-error-recovery-window”:“1h”,“ttl-check-poll-interval”:“12h”,“flow-control”:{“enable”:true,“soft-pending-compaction-bytes-limit”:“192GiB”,“hard-pending-compaction-bytes-limit”:“1TiB”,“memtables-threshold”:5,“l0-files-threshold”:20},“block-cache”:{“shared”:true,“capacity”:“3000MiB”,“num-shard-bits”:6,“strict-capacity-limit”:true,“high-pri-pool-ratio”:0.8,“memory-allocator”:“nodump”},“io-rate-limit”:{“max-bytes-per-sec”:“0KiB”,“mode”:“write-only”,“strict”:false,“foreground-read-priority”:“high”,“foreground-write-priority”:“high”,“flush-priority”:“high”,“level-zero-compaction-priority”:“medium”,“compaction-priority”:“low”,“replication-priority”:“high”,“load-balance-priority”:“high”,“gc-priority”:“high”,“import-priority”:“medium”,“export-priority”:“medium”,“other-priority”:“high”}},“pd”:{“endpoints”:[“http://basic-pd:2379”],“retry-interval”:“3
very”:10,"update

pdmonitor-0
stats-monitor
deadlock-0
refreash-config
purge-worker-0
backup-stream-0
sst-importer6
advance-ts
grpc-server-1
check_leader-0
grpc-server-2
sst-importer7
deadlock-detect
grpc-server-0
raft-stream-0
snap-handler-0
sst-importer
default-executo
raftstore-5-0
sst-importer4
sst-importer2
raftlog-fetch-w
rocksdb:low
region-collecto
inc-scanslogger
cleanup-worker-
backup-stream
apply-low-0
log-backup-scan
background-1
resource-meteri
gc-manager
snap-sender
grpc_global_tim
sst-importer5
rocksdb:high
re-metricstso
timerpd-worker-0sst-importer0
tikv-servercdc-0time updater
flow-checker

3/pd/7311282686139874110/raft/s/00000000000000000004
2basic-tikv-0.basic-tikv-peer.mycluster.svc:20160*
6.5.022basic-tikv-0.basic-tikv-peer.mycluster.svc:20160:
0.0.0.0:20180B(47b81680f75adc4b7200480cea5dbe46ae07c4b5H
3/pd/7311282686139874110/raft/s/00000000000000000001
2basic-tikv-1.basic-tikv-peer.mycluster.svc:20160*
6.5.022basic-tikv-1.basic-tikv-peer.mycluster.svc:20160:
0.0.0.0:20180B(47b81680f75adc4b7200480cea5dbe46ae07c4b5H
3/pd/7311282686139874110/raft/s/00000000000000000001
2basic-tikv-1.basic-tikv-peer.mycluster.svc:20160*
6.5.022basic-tikv-1.basic-tikv-peer.mycluster.svc:20160:
0.0.0.0:20180B(47b81680f75adc4b7200480cea5dbe46ae07c4b5H
3/pd/7311282686139874110/raft/s/00000000000000000001
2basic-tikv-1.basic-tikv-peer.mycluster.svc:20160*
6.5.022basic-tikv-1.basic-tikv-peer.mycluster.svc:20160:
0.0.0.0:20180B(47b81680f75adc4b7200480cea5dbe46ae07c4b5H

| username: dba远航 | Original post link

Check from the functions of PD (TSO, global ID, REGION information, etc.)

| username: Jellybean | Original post link

How is the cluster issue going?
If you are using K8S for deployment, it is recommended to have the relevant colleagues check if there are any issues with the K8S cluster. In similar situations encountered before, it was found that the underlying K8S could be the cause, so please check and confirm that as well.

| username: 江湖故人 | Original post link

Check if the PD node can find directories or files with larger capacities.