After PD migration, the UpdateTime of pump does not update

translator_bot · June 22, 2024, 4:24am

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: pd迁移后，pump的UpdateTime不更新

| username: TiDBer_yUoxD0vR

[TiDB Usage Environment] Production Environment / Testing / PoC
[TiDB Version]
[Reproduction Path] What operations were performed when the issue occurred
There are 3 PD nodes, pd1, pd2, pd3. After migrating these three PD nodes to pd4, pd5, pd6 (pd1, pd2, pd3 offline), using the command:
binlogctl -pd-urls=http://192.168.133.xxx:2379 -cmd pumps
to check, the UpdateTime does not update. However, the actual downstream data is present and business is not affected. The UpdateTime only updates after restarting this pump. Does the pump not automatically detect the PD node migration?
[Encountered Issue: Issue Phenomenon and Impact]
[Resource Configuration]
[Attachments: Screenshots / Logs / Monitoring]

translator_bot · June 22, 2024, 4:24am

| username: Billmay表妹 | Original post link

According to your description, you have migrated the original PD node to a new PD node, and when you use the command binlogctl -pd-urls=http://192.168.133.xxx:2379 -cmd pumps to check the pump status, you find that UpdateTime is not updating. However, the actual downstream data is present and business is not affected. You want to know why the pump did not automatically detect the change in the PD node.

According to the official TiDB documentation [1], when you change the PD node, you need to restart the pump to make it aware of the change. This is because the pump caches the PD node information and does not automatically update it. Therefore, when you change the PD node, you need to restart the pump to reload the PD node information.

If you want to avoid restarting the pump, you can try using the command binlogctl -pd-urls=http://192.168.133.xxx:2379 -cmd refresh-workers to refresh the pump’s worker list. This will force the pump to reload the PD node information and update its worker list. However, this method may cause a performance drop in the pump because it needs to reload all the worker information.

In summary, when you change the PD node, it is recommended to restart the pump to ensure it correctly detects the change.

translator_bot · June 22, 2024, 4:24am

| username: redgame | Original post link

When migrating PD nodes, ensure that the PD addresses in the pump configuration file have been correctly updated to the new pd4, pd5, and pd6 nodes.

translator_bot · June 22, 2024, 4:24am

| username: TiDBer_yUoxD0vR | Original post link

This cluster’s PD has been migrated for more than a year, and the pump has not been restarted. The logs have been consistently reporting “send heartbeat failed,” but it hasn’t affected data synchronization. Since the pump’s failure to send heartbeats doesn’t seem to have an impact, what are the potential consequences of this?

I tested that if the pump, whose UpdateTime doesn’t update due to not being restarted, is migrated, it will fail, and the logs will report that the drainer response cannot be found. Are there any other impacts?

translator_bot · June 22, 2024, 4:24am

| username: Raymond | Original post link

It may be related to this bug:

github.com/pingcap/tidb-binlog

The etcd client to the registry of Pump/Drainer info does not use auto-sync and fails when the PD cluster address changed

opened 02:42AM - 31 May 23 UTC

closed 09:15AM - 05 Jun 23 UTC

lance6716

affects-5.0 affects-4.0 affects-5.1 affects-5.2 affects-5.3 affects-5.4 affects-6.0 affects-6.1 affects-6.2 affects-6.3 affects-6.4 affects-6.5 affects-7.1

copied from https://github.com/pingcap/tidb/issues/42643 --- ## Bug Report… Please answer these questions before submitting your issue. Thanks! ### 1. Minimal reproduce step (Required) 1. Start a TiDB cluster with 3 PDs ① ② ③ and a Pump connected 2. Scale-out 3 more PDs ④ ⑤ ⑥ 3. Wait 31 seconds 4. Scale-in the original PDs ① ② ③ 5. Wait 31 seconds 6. Run `SHOW PUMP STATUS;` ### 2. What did you expect to see? (Required) We see the status of the pump at step 6 ### 3. What did you see instead (Required) Context deadline exceeded in `etcd.(*Client).List` ### 4. What is your TiDB version? (Required) v4.0.14

The pump hasn’t restarted, and the logs keep reporting “send heartbeat failed.”
Go check the pump status with show pump status to see if there’s any issue.

translator_bot · June 22, 2024, 4:24am

| username: TiDBer_yUoxD0vR | Original post link

The state of “show pump status” is online, and UpdateTime is not updating, but it does not actually affect data synchronization. Data can still be synchronized to the downstream through the drainer. So, what is the purpose of this heartbeat? Restarting the pump now won’t cause any other issues, right?

translator_bot · June 22, 2024, 4:24am

| username: Raymond | Original post link

There generally shouldn’t be any issues with restarting in turns, as TiDB writes to the pump in a round-robin manner.

translator_bot · June 22, 2024, 4:24am

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.