A TiKV Node is Slacking Off (Hardly Processing Requests)

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 某个tikv节点全程划水(几乎不处理请求)

| username: panqiao

【TiDB Usage Environment】Production Environment / Testing / PoC
【TiDB Version】6.5.5
【Reproduction Path】What operations were performed to cause the issue
【Encountered Issue: Issue Phenomenon and Impact】
【Resource Configuration】Enter TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page
【Attachments: Screenshots/Logs/Monitoring】
The system is very slow, and there are many slow queries in the database. The CPU usage of the TiKV nodes exceeds 80%, but there is one TiKV node whose CPU is almost unused throughout, indicating a serious load imbalance. My TiKV nodes have 4 cores and are deployed on k8s.

Looking at tikv-2 alone, it is almost idle throughout

| username: panqiao | Original post link

I checked the corresponding TiKV logs, and there are no error logs, only info logs.

| username: zhaokede | Original post link

The load is severely unbalanced. Is there not a single leader above?

| username: TiDBer_jYQINSnf | Original post link

Check the number of regions and leaders.

| username: TiDBer_QKDdYGfz | Original post link

The distribution of leaders is uneven, they are all followers.

| username: zhaokede | Original post link

Use TiDB’s monitoring tools (such as Grafana) and diagnostic logs to check the load status, data distribution, and GC status of TiKV nodes.

| username: 舞动梦灵 | Original post link

Open the TiDB Dashboard cluster information and check the host. Look at this graph and observe the average CPU usage and memory usage. If one is particularly high and the other is particularly low, it is basically the leader and follower. If the 999 latency is very low, you don’t need to handle it.

| username: Inkjade | Original post link

Check which node your hotspot table is distributed on. Analyze the SQL through the TiDB dashboard.

| username: panqiao | Original post link

The system has stabilized now, and the differences are not significant.

| username: panqiao | Original post link

| username: panqiao | Original post link

How to check?

| username: panqiao | Original post link


| username: panqiao | Original post link

Related monitoring

| username: panqiao | Original post link

Where is the heatmap monitoring? There are too many monitoring options, I really couldn’t find it.

| username: lemonade010 | Original post link

The default value of tidb_dml_batch_size is 20000.