High Load on TiFlash Node

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiFlash节点负载高

| username: japson

【TiDB Environment】Production
【TiDB Version】5.4.3
The TiFlash node load is high due to high read IO. To avoid impacting the business, all table TiFlash replica shards were set to 0. However, the node load remains high, and restarting the service or server does not resolve the issue.

Relevant logs are as follows:
root@ /data/tidb-deploy/tiflash-9000/log#tail tiflash_cluster_manager.log

[2024/06/18 15:41:00.355 +08:00] [INFO] [etcd.client] [Try to init master success, ttl: 60, create new key: /tiflash/cluster/leader]

[2024/06/18 15:41:00.355 +08:00] [INFO] [TiFlashManager] [After init, become master]

[2024/06/18 15:41:00.404 +08:00] [INFO] [TiFlashManager] [all replicas are available at global schema version 20469]

[2024/06/18 15:51:13.531 +08:00] [INFO] [etcd.client] [Try to init master success, ttl: 60, create new key: /tiflash/cluster/leader]

[2024/06/18 15:51:13.531 +08:00] [INFO] [TiFlashManager] [After init, become master]

[2024/06/18 15:51:13.580 +08:00] [INFO] [TiFlashManager] [all replicas are available at global schema version 20469]

[2024/06/18 15:54:19.227 +08:00] [INFO] [etcd.client] [Try to init master success, ttl: 60, create new key: /tiflash/cluster/leader]

[2024/06/18 15:54:19.228 +08:00] [INFO] [TiFlashManager] [After init, become master]

[2024/06/18 15:56:19.876 +08:00] [INFO] [TiFlashManager] [all replicas are available at global schema version 20469]

root@ /data/tidb-deploy/tiflash-9000/log#tail tiflash_error.log

[2024/06/18 15:55:57.859 +08:00] [ERROR] [] [“pingcap.pd:Send TsoRequest failed”] [thread_id=37]

[2024/06/18 15:55:57.869 +08:00] [ERROR] [] [“pingcap.pd:get safe point failed: 4: Deadline Exceeded”] [thread_id=38]

[2024/06/18 15:55:57.894 +08:00] [WARN] [] [“pd/oracle:update ts error: Exception: Send TsoRequest failed”] [thread_id=37]

[2024/06/18 15:55:57.896 +08:00] [WARN] [TCPHandler.cpp:69] [“TCPHandler:Client has not sent any data.”] [thread_id=39]

[2024/06/18 15:56:11.870 +08:00] [ERROR] [] [“pingcap.pd:get safe point failed: 4: Deadline Exceeded”] [thread_id=38]

[2024/06/18 15:56:11.876 +08:00] [ERROR] [] [“pingcap.pd:Send TsoRequest failed”] [thread_id=37]

[2024/06/18 15:56:14.889 +08:00] [WARN] [] [“pd/oracle:update ts error: Exception: Send TsoRequest failed”] [thread_id=37]

[2024/06/18 15:56:36.096 +08:00] [WARN] [StorageConfigParser.cpp:215] [“Application:The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2024/06/18 15:56:51.573 +08:00] [WARN] [TCPHandler.cpp:69] [“TCPHandler:Client has not sent any data.”] [thread_id=26]

[2024/06/18 15:57:26.855 +08:00] [ERROR] [] [“pingcap.pd:Receive TsoResponse failed”] [thread_id=27]

root@ /data/tidb-deploy/tiflash-9000/log#tail tiflash.log

[2024/06/18 15:57:28.874 +08:00] [DEBUG] [] [“grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_posix.cc, line number: 1261, log msg : cannot set inq fd=1311 errno=92”] [thread_id=19]

[2024/06/18 15:57:28.874 +08:00] [DEBUG] [] [“grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_posix.cc, line number: 1261, log msg : cannot set inq fd=1312 errno=92”] [thread_id=19]

[2024/06/18 15:57:28.874 +08:00] [DEBUG] [] [“grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_posix.cc, line number: 1261, log msg : cannot set inq fd=1313 errno=92”] [thread_id=19]

[2024/06/18 15:57:28.874 +08:00] [DEBUG] [] [“grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_posix.cc, line number: 1261, log msg : cannot set inq fd=1314 errno=92”] [thread_id=19]

[2024/06/18 15:57:28.874 +08:00] [DEBUG] [] [“grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_posix.cc, line number: 1261, log msg : cannot set inq fd=1315 errno=92”] [thread_id=19]

[2024/06/18 15:57:28.874 +08:00] [DEBUG] [] [“grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_posix.cc, line number: 1261, log msg : cannot set inq fd=1316 errno=92”] [thread_id=19]

[2024/06/18 15:57:28.874 +08:00] [DEBUG] [] [“grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/core/lib/iomgr/tcp_posix.cc, line number: 1261, log msg : cannot set inq fd=1317 errno=92”] [thread_id=19]

[2024/06/18 15:57:34.884 +08:00] [INFO] [RateLimiter.cpp:690] [“IOLimitTuner:limiter 0 write 0 read 0 NOT need to tune.”] [thread_id=17]

[2024/06/18 15:57:38.849 +08:00] [INFO] [] [“WaitCheckRegionReady:28714 regions need to fetch latest commit-index in next round, sleep for 5s”] [thread_id=1]

[2024/06/18 15:57:44.872 +08:00] [ERROR] [] [“pingcap.tikv:Get Failed4: Deadline Exceeded”] [thread_id=28]

root@ /data/tidb-deploy/tiflash-9000/log#tail tiflash_stderr.log

Logging debug to /data/tidb-deploy/tiflash-9000/log/tiflash.log

Logging errors to /data/tidb-deploy/tiflash-9000/log/tiflash_error.log

Logging debug to /data/tidb-deploy/tiflash-9000/log/tiflash.log

Logging errors to /data/tidb-deploy/tiflash-9000/log/tiflash_error.log

Logging debug to /data/tidb-deploy/tiflash-9000/log/tiflash.log

Logging errors to /data/tidb-deploy/tiflash-9000/log/tiflash_error.log

Logging debug to /data/tidb-deploy/tiflash-9000/log/tiflash.log

Logging errors to /data/tidb-deploy/tiflash-9000/log/tiflash_error.log

Logging debug to /data/tidb-deploy/tiflash-9000/log/tiflash.log

Logging errors to /data/tidb-deploy/tiflash-9000/log/tiflash_error.log

root@ /data/tidb-deploy/tiflash-9000/log#tail tiflash_tikv.log

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:28.871 +08:00] [WARN] [future.rs:24] [“paired_future_callback: Failed to send result to the future rx, discarded.”]

[2024/06/18 15:57:42.899 +08:00] [WARN] [store.rs:859] [“[store 128] handle 1 pending peers include 1 ready, 0 entries, 0 messages and 0 snapshots”] [takes=32976]

| username: tidb菜鸟一只 | Original post link

Is your PD node okay? It seems like there are many PD errors.

| username: TIDB-Learner | Original post link

Did you overlook that the server’s firewall is enabled?

| username: zhaokede | Original post link

Are you still synchronizing data?
High IO, is it HDD storage?

| username: xfworld | Original post link

Are the TiKV nodes and PD nodes in a normal state?

| username: 有猫万事足 | Original post link

The error seems to indicate that PD is inaccessible.

| username: 小于同学 | Original post link

Is it HDD storage?

| username: TiDBer_QKDdYGfz | Original post link

A lot of errors, there might be an issue with communication with PD.