The cluster's health status is poor, can anyone help check what needs to be done?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 集群的健康状况不行,大家帮忙看看要做些啥

| username: 点点-求助来了

[TiDB Usage Environment] Production Environment
[TiDB Version] 5.0.1
[Reproduction Path] Cluster data delay, health check found many abnormal items
[Encountered Problem: Problem Phenomenon and Impact] Cluster data delay, select count can find data increase, but cannot query data, it takes a long time to see the data
[Resource Configuration] 40 cores 128 memory * 5
[Attachments: Screenshots/Logs/Monitoring]




| username: liuis | Original post link

In the dashboard, and also check the cluster logs.

| username: tidb菜鸟一只 | Original post link

Did you not check the environment before installation? It seems that many of your hardware environments are not set up… like modifying the CPU mode, disabling swap, modifying disk mount options, setting ulimit parameters, etc. You can modify these configurations according to the prompts.

| username: 孤君888 | Original post link

Check the logs of tidb/tikv/pd to see what errors are reported.

| username: h5n1 | Original post link

What does this mean? How is the cluster data delay coming from, and the problem scenario needs to be described further. Based on your description, it seems to be a performance issue. Additionally, the CPU power-saving mode might affect system performance, but it may not necessarily be the cause of the current issue.

| username: 点点-求助来了 | Original post link

When executing “select count(1) from table_name”, you can see the increase in data. However, when executing “select * from table_name”, you do not see the latest data. There is a delay, sometimes as long as 1 to 2 hours.

| username: dba-kit | Original post link

Uh, this query issue is not discussed. If you haven’t turned off swap, it’s easy to experience performance jitter.
Actually, for TiDB, it’s easy to fix these system parameter issues. Just add --apply after the tiup cluster check command. An example is:

tiup cluster check --cluster dw-tidb6 -u tidb --apply
| username: dba-kit | Original post link

It seems to be related to transactions, but it’s unlikely that count can come out, and select * can’t find the latest data. Could you please share the SQL query? You can mask any sensitive information.

| username: 点点-求助来了 | Original post link

A very simple SQL without any conditions, just a “SELECT * FROM table_name” with a WHERE clause. I checked the table locks and there were no lock-related records.

| username: 点点-求助来了 | Original post link


| username: 点点-求助来了 | Original post link

Installed by other colleagues, but they have all left. Actually, what puzzles me the most is the port conflict.

| username: xingzhenxiang | Original post link

Upgrade tiup and tiup cluster and check again, the port conflict will be resolved.

| username: 点点-求助来了 | Original post link

What should the CPU be changed to, and how to do it?

| username: tidb菜鸟一只 | Original post link

Refer to the parameters in this document and adjust accordingly. For the CPU, switch to performance mode.

| username: dba-kit | Original post link

Dizzy, you are checking the row count through information_schema.tables and looking at the data with select *. Try using select count(1). I suspect your data insertion is a large transaction, so you can’t see the information being inserted before the transaction is completed.

| username: 孤君888 | Original post link

Port conflict?