Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.Original topic: tikv缩容,影响tidb连接中断
[TiDB Usage Environment] Production Environment
[TiDB Version] v5.4.0
[Reproduction Path] TiKV Disk Shrinkage
[Encountered Problem: Phenomenon and Impact]
[Attachment: Screenshot/Log/Monitoring]
Steps to shrink TiKV disk:
-
Stop TiKV first
-
Copy data
-
Restart TiKV
-
Stop TiKV first
tiup cluster stop tidb-risk -N 10.0.0.17:20160
Check cluster status, shows Disconnected
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
10.0.0.10:9093 alertmanager 10.0.0.10 9093/9094 linux/x86_64 Up /data/tidb-data/alertmanager-9093 /data/tidb-deploy/alertmanager-9093
10.0.0.10:3000 grafana 10.0.0.10 3000 linux/x86_64 Up - /data/tidb-deploy/grafana-3000
10.0.0.11:2379 pd 10.0.0.11 2379/2380 linux/x86_64 Up|L /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
10.0.0.12:2379 pd 10.0.0.12 2379/2380 linux/x86_64 Up /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
10.0.0.13:2379 pd 10.0.0.13 2379/2380 linux/x86_64 Up|UI /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
10.0.0.10:9090 prometheus 10.0.0.10 9090/12020 linux/x86_64 Up /data/tidb-data/prometheus-9090 /data/tidb-deploy/prometheus-9090
10.0.0.14:4000 tidb 10.0.0.14 4000/10080 linux/x86_64 Up - /data/tidb-deploy/tidb-4000
10.0.0.15:4000 tidb 10.0.0.15 4000/10080 linux/x86_64 Up - /data/tidb-deploy/tidb-4000
10.0.0.16:4000 tidb 10.0.0.16 4000/10080 linux/x86_64 Up - /data/tidb-deploy/tidb-4000
10.0.0.17:20160 tikv 10.0.0.17 20160/20180 linux/x86_64 Disconnected /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
10.0.0.18:20160 tikv 10.0.0.18 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
10.0.0.19:20160 tikv 10.0.0.19 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
-
Copy data
cp -a /data /data1 -
Start TiKV
tiup cluster start tidb-risk -N 10.0.0.17:20160
Check TiKV detail
When one TiKV is shut down, a leader election occurs, and after recovery, each node balances the leader again.
At this time, the business receives an alert:
com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet successfully received from the server was 10,674 milliseconds ago. The last packet sent successfully to the server was 10,674 milliseconds ago.
at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:370)
at com.opay.realtime.etl.util.JdbcUtil.riskSink(JdbcUtil.scala:142)
at com.opay.risk.features.sink.TableSinkMappingBroadcastProcessExt$TableSinkMappingProcess.doProcess(TableSinkMappingBroadcastProcessExt.scala:34)
at com.opay.risk.features.sink.TableSinkMappingBroadcastProcessFunction$$anonfun$processElement$1.apply(TableSinkMappingBroadcastProcessFunction.scala:64)
at com.opay.risk.features.sink.TableSinkMappingBroadcastProcessFunction$$anonfun$processElement$1.apply(TableSinkMappingBroadcastProcessFunction.scala:63)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
at com.opay.risk.features.sink.TableSinkMappingBroadcastProcessFunction.processElement(TableSinkMappingBroadcastProcessFunction.scala:63)
at com.opay.risk.features.sink.TableSinkMappingBroadcastProcessFunction.processElement(TableSinkMappingBroadcastProcessFunction.scala:24)
at org.apache.flink.streaming.api.operators.co.CoBroadcastWithNonKeyedOperator.processElement1(CoBroadcastWithNonKeyedOperator.java:110)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.processRecord1(StreamTwoInputProcessorFactory.java:213)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.lambda$create$0(StreamTwoInputProcessorFactory.java:178)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:291)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
at org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:96)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
The last packet successfully received from the server was 10,674 milliseconds ago. The last packet sent successfully to the server was 10,674 milliseconds ago.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:546)
at com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(NativeProtocol.java:710)
at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:649)
at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:948)
at com.mysql.cj.NativeSession.execSQL(NativeSession.java:1075)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:930)
… 25 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
at com.mysql.cj.protocol.FullReadInputStream.readFully(FullReadInputStream.java:67)
at com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:63)
at com.mysql.cj.protocol.a.SimplePacketReader.readHeader(SimplePacketReader.java:45)
at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:52)
at com.mysql.cj.protocol.a.TimeTrackingPacketReader.readHeader(TimeTrackingPacketReader.java:41)
at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:54)
at com.mysql.cj.protocol.a.MultiPacketReader.readHeader(MultiPacketReader.java:44)
at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:540)
… 30 more
Connection parameters jdbc:mysql://10.0.0.11:3306/orders?useSSL=false&rewriteBatchedStatements=true&autoReconnect=true
This 3306 is haproxy started on the PD node, proxying the three tidb-4000 nodes behind it.
My understanding is that operating TiKV caused a leader election, and requests went to the new store, which should not affect the connection errors on TiDB. Why does this affect the connection interruption on TiDB?