Binlog synchronization failed, drainer log error

translator_bot · June 23, 2024, 12:04pm

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Binlog同步失败，drainer日志报错

| username: 普罗米修斯

【TiDB Usage Environment】Test Environment
【TiDB Version】Upstream Cluster v3.0, Downstream Cluster v5.2
【Encountered Problem】
After performing an insert operation on the upstream cluster, there is no synchronization observed in the downstream cluster. Checking the drainer.log log shows the error [ERROR] [pump.go:147] [“pump receive binlog failed”] [id=DC04:8250] [error=“rpc error: code = Unknown desc = cluster ID are mismatch, 7099656665064250735 vs 7104150018129249860”].

【Attachments】

Both drainer and pump are online

Image1380×164 48.6 KB
Binlog is normally enabled
Pump log

Image1706×610 70.8 KB
Drainer error log

Image1718×596 22 KB

translator_bot · June 23, 2024, 12:04pm

| username: 普罗米修斯 | Original post link

Has this post been buried?

translator_bot · June 23, 2024, 12:04pm

| username: yilong | Original post link

Check the current cluster ID: PD Recover 使用文档 | PingCAP 文档中心
Was there a previous pd-recover, or are there multiple clusters with incorrect binlog configurations? Are two clusters sharing one?

translator_bot · June 23, 2024, 12:04pm

| username: 普罗米修斯 | Original post link

The binlog configuration is normal. The binlog worked for two days before, but then it stopped synchronizing incremental data. I deleted the pump, drainer, and binlog configurations, restarted the cluster, and reconfigured the pump, drainer, and binlog files. After restarting the TiDB cluster, incremental synchronization was normal again. However, after two days, incremental synchronization failed. Checking the drainer logs, I found the error as shown above.

It looks like the cluster ID is conflicting. Could you please advise on how to specify a new cluster ID to ensure drainer synchronization works correctly?

translator_bot · June 23, 2024, 12:04pm

| username: Min_Chen | Original post link

I checked, the clusterID is obtained from PD. It is recommended to rebuild the drainer after recording the checkpoint. Directly fixing it is not easy and carries significant risk.

translator_bot · June 23, 2024, 12:04pm

| username: 普罗米修斯 | Original post link

Previously, I switched a PD (192.168.40.13) node from another TiDB cluster. Using pd-ctl to specify this IP, I can only see this one PD member.

Using pd-ctl to specify other IPs, I can see the other two PD members, but I can’t see the PD node 13.

Moreover, their cluster IDs are different. How should I adjust this?

translator_bot · June 23, 2024, 12:04pm

| username: Min_Chen | Original post link

The PDs of different clusters cannot be placed in the same cluster. You can now remove this PD and then scale one back in.

translator_bot · June 23, 2024, 12:04pm

| username: 普罗米修斯 | Original post link

I am testing the scaling in and out of nodes (tikv, pd, tidb) between two clusters (TiDB(3.0.3)\TiDB(5.2.4)). Several issues have arisen during the process.

Currently, I scaled in the pd (192.168.40.14) of v3.0.3 and then scaled it out to the v5.2.4 cluster, and the test was normal. However, when I scaled in the tikv (192.168.40.13) of v5.2.4, it has been in the Pending Offline state for a week, with some data in the test cluster.

Image1380×496 123 KB

I have already moved the leader to other nodes using pd-ctl, but there are still region peers on it. Could you please advise on how to transfer them?

Image562×547 12 KB
When I scaled in the pd (192.168.40.13) of v5.2.4 and then scaled it out to the v3.0.3 cluster, the issue you mentioned in your screenshot occurred. I could only see it by specifying 192.168.40.13 using pd-ctl, and the cluster_id was different.

translator_bot · June 23, 2024, 12:04pm

| username: Min_Chen | Original post link

Without a leader, the service can be stopped.

For servers from other clusters, please ensure the data directory is cleared before expansion. If clearing the data is inconvenient, please use another directory.

translator_bot · June 23, 2024, 12:04pm

| username: 普罗米修斯 | Original post link

Hello, I used tiup cluster scale-in <cluster-name> --node IP:20160 in the v5.2.4 cluster and it showed that the scale-in was successful, but it has been in the offline state and has not changed to the Tombstone state. This is because there are still peers on the store, and it should automatically transfer the leader and region to become the Tombstone state. However, after executing the scale-in command, the data has not been transferred. How can I change the state?

translator_bot · June 23, 2024, 12:04pm

| username: system | Original post link

This topic will be automatically closed 60 days after the last reply. No new replies are allowed.