Abnormal Cluster Traffic

translator_bot · June 23, 2024, 2:35am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 集群流量异常

| username: dbaspace

Cluster traffic is abnormal. By checking TIDB_HOT_REGIONS,

most of it is from the table stats_histograms. The analyze parameters are set as shown in the following image:

How can this large traffic be controlled effectively?

translator_bot · June 23, 2024, 2:35am

| username: songxuecheng | Original post link

What version of TiDB?

translator_bot · June 23, 2024, 2:35am

| username: Kongdom | Original post link

Did the analysis not consider the time zone? It needs to be changed to +0800 to match our local time.

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

V4.0.9, this happened in the last 2 days.

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

Well, this cluster has been running for more than a year. TiDB is using a gigabit network card. In recent days, at certain times, the network card is completely saturated. According to nethogs, the TiDB-SERVER network traffic received reaches 100MB/s.

translator_bot · June 23, 2024, 2:35am

| username: Kongdom | Original post link

Are there slow queries in the slow log?

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

The network of the TIDB-SERVER machine is fully utilized:

Triggering high latency in TIKV responses:

The request latency on the tidb-server node is also quite high, causing the overall cluster response to be slow.

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

The issue with the MYSQL table statement hasn’t been identified. Under normal circumstances, SQL requests for business tables in the cluster are very fast.

translator_bot · June 23, 2024, 2:35am

| username: Kongdom | Original post link

I suggest adjusting the time range for analyze and observing again. With the current settings, it should now be within the automatic analyze interval.

translator_bot · June 23, 2024, 2:35am

| username: songxuecheng | Original post link

statistics: skip reading mysql.stats_histograms if cached stats is up-to-date (#24175) by ti-srebot · Pull Request #24352 · pingcap/tidb · GitHub Take a look at this

translator_bot · June 23, 2024, 2:35am

| username: h5n1 | Original post link

The high traffic received by the TiDB server could also be due to issues with the SQL execution plan. Check if there are any anomalies or changes in the slow SQL.

translator_bot · June 23, 2024, 2:35am

| username: xfworld | Original post link

It is best to observe through Prometheus first, see which nodes have abnormal network traffic, and then conduct troubleshooting.

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

Several TiKV nodes are sending data to the TiDB server, and the total traffic is overwhelming the TiDB server nodes.

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

The business SQL has basically not changed, and the newly added SQL queries are all very fast.

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

Okay, I’ll observe it tonight.

translator_bot · June 23, 2024, 2:35am

| username: wuxiangdong | Original post link

Collecting statistics too frequently has caused a hotspot. Change 0.5 to 0.6 and make some adjustments.

translator_bot · June 23, 2024, 2:35am

| username: xiaohetao | Original post link

When was the last analyze on this table, and what is the health status of the table?

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

Hmm, another wave just came.

translator_bot · June 23, 2024, 2:35am

| username: dbaspace | Original post link

There are many tables with low health in the business tables. TiDB data is written through DM, and frequent changes in the business tables can easily lead to lower monitoring levels.

translator_bot · June 23, 2024, 2:35am

| username: xiaohetao | Original post link

The time period for statistics can be shorter, limited to off-peak business hours or at night.