After Expanding TiKV Nodes, Region Distribution is Uneven and Store Scores Vary Greatly

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 扩容TiKV节点后,region分布不均,store打分差异很大

| username: seiang

Version: v5.0.3

Question:
After expanding the TiDB cluster with an additional TiKV node, only the data from one of the original nodes and the newly added TiKV node are balanced, while the regions on other nodes remain unchanged. After balancing, the score differences among the six TiKV nodes are significant.

How should this situation be handled?


However, the leader balance is normal.

| username: 我是咖啡哥 | Original post link

Is the data disk size the same?

| username: h5n1 | Original post link

select store_id, address, leader_weight, region_weight from information_schema.tikv_store_status; Take a look
| username: seiang | Original post link

The settings for leader_weight and region_weight are the same.

| username: seiang | Original post link

The size of the data disks is the same.

| username: h5n1 | Original post link

What about the remaining space?

| username: seiang | Original post link

The remaining space is different.

| username: h5n1 | Original post link

Looking at the region graph, the number of regions for the two TiKVs below is slowly increasing, while the three above are decreasing.

| username: seiang | Original post link

At present, the scheduling seems to be normal, but around 12:37 PM, the cluster became unavailable for about 5 minutes. On the business side, there were connection issues (specifically, the error was “mysql gone away”), the QPS dropped significantly, and the cluster latency reached 5 minutes.

TiDB node logs:

[2022/08/19 12:37:56.735 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.033386751s] [conn_id=168261983] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.735 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.02839721s] [conn_id=168261987] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.736 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.092538948s] [conn_id=168261935] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.736 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.051988913s] [conn_id=168261963] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.054041672s] [conn_id=168262029] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.003240048s] [conn_id=168262071] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.074404667s] [conn_id=168262019] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.832 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.013422469s] [conn_id=168262065] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.035368856s] [conn_id=168262041] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.036402935s] [conn_id=168262039] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.08437904s] [conn_id=168262011] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.030256464s] [conn_id=168262047] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.833 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.084899383s] [conn_id=168262013] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06933023s] [conn_id=168262023] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.029930219s] [conn_id=168262051] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.081400257s] [conn_id=168262015] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.834 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.101386805s] [conn_id=168262005] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.048546961s] [conn_id=168262033] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.047254189s] [conn_id=168262037] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06448464s] [conn_id=168262027] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.048679756s] [conn_id=168262035] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.835 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.07375419s] [conn_id=168262021] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.06537717s] [conn_id=168262025] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.088627056s] [conn_id=168262009] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.056906712s] [conn_id=168262031] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.028678417s] [conn_id=168262053] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.021358943s] [conn_id=168262059] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.095096163s] [conn_id=168262007] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.836 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.032947036s] [conn_id=168262049] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.025771198s] [conn_id=168262055] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.036994875s] [conn_id=168262043] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.837 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.024306845s] [conn_id=168262057] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.037391433s] [conn_id=168262045] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.084064876s] [conn_id=168262017] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]
[2022/08/19 12:37:56.838 +08:00] [WARN] [expensivequery.go:178] [expensive_query] [cost_time=60.018369699s] [conn_id=168262067] [user=srv_t1111l] [txn_start_ts=0] [mem_max=“0 Bytes (0 Bytes)”] [sql=“use testdb”]

[2022/08/19 12:38:35.754 +08:00] [INFO] [conn.go:812] [“command dispatched failed”] [conn=168230619] [connInfo=“id:168230619, addr:10.30.219.227:34772 status:10, collation:latin1_swedish_ci, user:srv_t1111l”] [command=Query] [status=“inTxn:0, autocommit:1”] [sql=“insert into bill_xxxxxx (user_id, hfsm_id, fsm_id, state_num, detail, c_date) values (941526237,10068300,1,3,‘’,from_unixtime(1660883758));”] [txn_mode=PESSIMISTIC] [err=“write tcp 10.30.128.28:4000->10.30.219.227:34772: write: broken pipe
github.com/pingcap/errors.AddStack
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.Trace
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15
github.com/pingcap/tidb/server.(*packetIO).flush
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:181
github.com/pingcap/tidb/server.(*clientConn).flush
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1126
github.com/pingcap/tidb/server.(*clientConn).writeOkWith
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1159
github.com/pingcap/tidb/server.(*clientConn).handleQuerySpecial
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1703
github.com/pingcap/tidb/server.(*clientConn).handleStmt
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1661
github.com/pingcap/tidb/server.(*clientConn).handleQuery
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1503
github.com/pingcap/tidb/server.(*clientConn).dispatch
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1037
github.com/pingcap/tidb/server.(*clientConn).Run
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:795
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:477
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1357”]
[2022/08/19 12:38:35.754 +08:00] [ERROR] [terror.go:291] ["enc

| username: h5n1 | Original post link

Is there a problem with the network?

| username: Kongdom | Original post link

If the server configurations are inconsistent, including the network environment, it is normal for the scores to be inconsistent.

| username: seiang | Original post link

The server configuration and network environment are the same.

| username: seiang | Original post link

It’s not a network issue.

| username: xiaohetao | Original post link

Is it still possible to check the server’s resource usage when the latency is high?

| username: alfred | Original post link

The logs suggest it might be a network issue. I recommend investigating it, such as checking for network jitter and similar problems.

| username: wuxiangdong | Original post link

It’s fine if the leaders are evenly distributed. Are there some empty regions affecting the statistics?