Manual Configuration of TLS Between Components Failed

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 手动配置组件间TLS失败

| username: EricSong

【TiDB Environment】Production
【TiDB Version】6.3.0
【Problem Encountered】After manually configuring the TLS certificate and reloading, TiKV and TiDB failed to start.
【Reproduction Path】Manually create certificates, tiup cluster edit-config
【Problem Phenomenon and Impact】tiup cluster reload failed, some TiKV and TiDB statuses changed to disconnected

【Attachments】:

  • Relevant logs
    Log from TiKV

[2022/11/02 01:41:38.893 +00:00] [INFO] [util.rs:551] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { code: 14-UNAVAILABLE, message: "failed to connect to all addresses", details: }))”] [endpoints=10.250.87.38:2378]
[2022/11/02 01:41:38.893 +00:00] [INFO] [util.rs:589] [“connecting to PD endpoint”] [endpoints=10.250.87.122:2378]
[2022/11/02 01:41:38.895 +00:00] [INFO] [] ["subchannel 0x7faa87d35000 {address=ipv4:10.250.87.122:2378, args=grpc.client_channel_factory=0x7faa87c9c1f8, grpc.default_authority=10.250.87.122:2378, grpc.http2_scheme=https, grpc.internal.channel_credentials=0x7faa875c7c60, grpc.internal.security_connector=0x7faa7fe719b0…

Log from PD

[2022/11/03 02:21:45.797 +00:00] [INFO] [server.go:1406] [“start to watch pd leader”] [pd-leader="name:"pd-10.250.87.38-2378" member_id:16326710579290257846 peer_urls:"http://10.250.87.38:2380" client_urls:"http://10.250.87.38:2378" "]
[2022/11/03 02:21:45.799 +00:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {10.250.87.38:2378 0 }. Err :connection error: desc = "transport: authentication handshake failed: EOF". Reconnecting…”]
[2022/11/03 02:21:45.833 +00:00] [WARN] [leadership.go:194] [“required revision has been compacted, use the compact revision”] [required-revision=305224] [compact-revision=713535]
[2022/11/03 02:21:46.800 +00:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {10.250.87.38:2378 0 }. Err :connection error: desc = "transport: authentication handshake failed: EOF". Reconnecting…”]
[2022/11/03 02:21:48.461 +00:00] [WARN] [stream.go:436] [“lost TCP streaming connection with remote peer”] [stream-reader-type=“stream MsgApp v2”] [local-member-id=f5476fe9c527f5b9] [remote-peer-id=3310c14e027b4f49] [error=EOF]
[2022/11/03 02:21:48.463 +00:00] [WARN] [stream.go:436] [“lost TCP streaming connection with remote peer”] [stream-reader-type=“stream Message”] [local-member-id=f5476fe9c527f5b9] [remote-peer-id=3310c14e027b4f49] [error=EOF]
[2022/11/03 02:21:48.464 +00:00] [WARN] [peer_status.go:68] [“peer became inactive (message send to peer failed)”] [peer-id=3310c14e027b4f49] [error=“failed to dial 3310c14e027b4f49 on stream Message (peer 3310c14e027b4f49 failed to find local node f5476fe9c527f5b9)”]
[2022/11/03 02:21:48.566 +00:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {10.250.87.38:2378 0 }. Err :connection error: desc = "transport: authentication handshake failed: EOF". Reconnecting…”]

  • Configuration file

server_configs:
tidb:
binlog.enable: false
binlog.ignore-error: false
log.file.max-days: 7
log.slow-threshold: 300
mem-quota-query: 524288000
oom-action: cancel
performance.txn-total-size-limit: 10730418240
security.cluster-ssl-ca: /vdb/tidb-certs/idtrca.cer
security.cluster-ssl-cert: /vdb/tidb-certs/server-cert.pem
security.cluster-ssl-key: /vdb/tidb-certs/server-key.pem
security.ssl-ca: /vdb/tidb-certs/idtrca.cer
security.ssl-cert: /vdb/tidb-certs/server-cert.pem
security.ssl-key: /vdb/tidb-certs/server-key.pem
tikv-client.copr-cache.enable: false
tikv:
readpool.coprocessor.use-unified-pool: true
readpool.storage.use-unified-pool: false
security.ca-path: /vdb/tidb-certs/idtrca.cer
security.cert-path: /vdb/tidb-certs/server-cert.pem
security.key-path: /vdb/tidb-certs/server-key.pem
pd:
auto-compaction-retention: 5m
log.file.max-days: 7
quota-backend-bytes: 17179869184
schedule.leader-schedule-limit: 4
schedule.region-schedule-limit: 2048
schedule.replica-schedule-limit: 64
security.cacert-path: /vdb/tidb-certs/idtrca.cer
security.cert-path: /vdb/tidb-certs/server-cert.pem
security.key-path: /vdb/tidb-certs/server-key.pem

| username: Billmay表妹 | Original post link

Check before deployment: Manually configure SSH mutual trust and password-free sudo

Take a look at these two contents~

| username: undefined | Original post link

Doesn’t this already state the reason…?

| username: undefined | Original post link

Please send your PD configuration file.

| username: tidb狂热爱好者 | Original post link

This is a log of authentication failure.

| username: EricSong | Original post link

I am using the edit-config on tiup, and this is the configuration for PD:

pd:
auto-compaction-retention: 5m
log.file.max-days: 7
quota-backend-bytes: 17179869184
schedule.leader-schedule-limit: 4
schedule.region-schedule-limit: 2048
schedule.replica-schedule-limit: 64
security.cacert-path: /vdb/tidb-certs/idtrca.cer
security.cert-path: /vdb/tidb-certs/server-cert.pem
security.key-path: /vdb/tidb-certs/server-key.pem

| username: EricSong | Original post link

Is it possible to check the specific reason for the authentication failure?
Because I see that the URL used by PD is still HTTP, it feels like TLS is not enabled, but the authentication failure prompt is a TLS prompt. I would like to ask how to troubleshoot the specific reason for the authentication failure in this case?

| username: neilshen | Original post link

How about changing http:// to https:// in the configuration of components like PD, TiKV, and TiDB?

| username: srstack | Original post link

Did you modify the TLS configuration for PD? This is a limitation of PD/etcd. Before setting up TLS, the peers in PD/etcd are in HTTP format. After setting up TLS, this configuration is persisted in etcd, making it unmodifiable and causing the etcd cluster to fail to establish. You should see PD continuously crashing.

Solution:

  1. Remove the TLS configuration and use tiup to forcibly scale down the number of PD nodes to 1.
  2. Modify the PD startup parameters to add --force-new-cluster in scripts/run.sh.
  3. Remove --force-new-cluster and restart PD.
  4. The cluster should be back to normal at this point.

TLS configuration solution:

  1. Use the tiup cluster tls command to configure.
| username: 大发发发发发 | Original post link

It’s about the SSH passwordless configuration issue.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.