Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: 扩容TiKV节点后,region分布不均,store打分差异很大
Version: v5.0.3
Question:
After expanding the TiDB cluster with an additional TiKV node, only the data from one of the original nodes and the newly added TiKV node are balanced, while the regions on other nodes remain unchanged. After balancing, the score differences among the six TiKV nodes are significant.
How should this situation be handled?
However, the leader balance is normal.
Is the data disk size the same?
select store_id, address, leader_weight, region_weight from information_schema.tikv_store_status; Take a look
select store_id, address, leader_weight, region_weight from information_schema.tikv_store_status;
The settings for leader_weight and region_weight are the same.
The size of the data disks is the same.
What about the remaining space?
The remaining space is different.
Looking at the region graph, the number of regions for the two TiKVs below is slowly increasing, while the three above are decreasing.
At present, the scheduling seems to be normal, but around 12:37 PM, the cluster became unavailable for about 5 minutes. On the business side, there were connection issues (specifically, the error was “mysql gone away”), the QPS dropped significantly, and the cluster latency reached 5 minutes.
TiDB node logs:
[2022/08/19 12:37:56.735 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.033386751s] [conn_id=168261983] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.735 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.02839721s] [conn_id=168261987] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.736 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.092538948s] [conn_id=168261935] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.736 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.051988913s] [conn_id=168261963] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.054041672s] [conn_id=168262029] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.003240048s] [conn_id=168262071] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.074404667s] [conn_id=168262019] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.013422469s] [conn_id=168262065] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.035368856s] [conn_id=168262041] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.036402935s] [conn_id=168262039] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.08437904s] [conn_id=168262011] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.030256464s] [conn_id=168262047] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.084899383s] [conn_id=168262013] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06933023s] [conn_id=168262023] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.029930219s] [conn_id=168262051] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.081400257s] [conn_id=168262015] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.101386805s] [conn_id=168262005] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.048546961s] [conn_id=168262033] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.047254189s] [conn_id=168262037] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06448464s] [conn_id=168262027] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.048679756s] [conn_id=168262035] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.07375419s] [conn_id=168262021] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06537717s] [conn_id=168262025] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.088627056s] [conn_id=168262009] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.056906712s] [conn_id=168262031] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.028678417s] [conn_id=168262053] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.021358943s] [conn_id=168262059] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.095096163s] [conn_id=168262007] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.032947036s] [conn_id=168262049] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.025771198s] [conn_id=168262055] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.036994875s] [conn_id=168262043] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.024306845s] [conn_id=168262057] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.037391433s] [conn_id=168262045] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.084064876s] [conn_id=168262017] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.018369699s] [conn_id=168262067] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb
”]
[2022/08/19 12:38:35.754 +08:00] [INFO] [conn.go:812] [“command dispatched failed”] [conn=168230619] [connInfo=“id:168230619, addr:10.30.219.227:34772 status:10, collation:latin1_swedish_ci, user:srv_t1111l”] [command=Query] [status=“inTxn:0, autocommit:1”] [sql=“insert into bill_xxxxxx (user_id, hfsm_id, fsm_id, state_num, detail, c_date) values (941526237,10068300,1,3,‘ ’,from_unixtime(1660883758));”] [txn_mode=PESSIMISTIC] [err=“write tcp 10.30.128.28:4000->10.30.219.227:34772: write: broken pipe
github.com/pingcap/errors.AddStack
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb/server.(*packetIO).flush
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:181
github.com/pingcap/tidb/server.(*clientConn).flush
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1126
github.com/pingcap/tidb/server.(*clientConn).writeOkWith
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1159
github.com/pingcap/tidb/server.(*clientConn).handleQuerySpecial
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1703
github.com/pingcap/tidb/server.(*clientConn).handleStmt
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1661
github.com/pingcap/tidb/server.(*clientConn).handleQuery
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1503
github.com/pingcap/tidb/server.(*clientConn).dispatch
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1037
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:795
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2022/08/19 12:38:35.754 +08:00] [ERROR] [terror.go:291] ["enc
Is there a problem with the network?
If the server configurations are inconsistent, including the network environment, it is normal for the scores to be inconsistent.
The server configuration and network environment are the same.
It’s not a network issue.
Is it still possible to check the server’s resource usage when the latency is high?
The logs suggest it might be a network issue. I recommend investigating it, such as checking for network jitter and similar problems.
It’s fine if the leaders are evenly distributed. Are there some empty regions affecting the statistics?