Is there a way for the client-go client to support dynamically switching the TiKV cluster address?

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: client-go 客户端是否有办法支持动态切换连接的tikv集群地址

| username: 夜-NULL

Is there a way for the client-go client to support dynamically switching the TiKV cluster address?

Achieve txnkv.NewClient(pds) so that after modifying the pds configuration, it can connect to the new TiKV cluster;
This dynamic modification can be done by monitoring a remote configuration, and upon detecting a configuration change, trigger the modification to connect to the new PD (TiKV);

Through this method, achieve rapid fault recovery for TiKV; some losses can be tolerated, such as second-level request timeouts or errors, etc.

| username: xfworld | Original post link

It is definitely possible, but you have to implement it yourself. Create a branch and give it a try :+1:
Alternatively, you can use a proxy to achieve this, with the proxy handling the liveness check…

TiKV itself is highly available and multi-replica. If resources are allocated properly, the recovery speed will be relatively fast.

| username: 夜-NULL | Original post link

Sure, thanks~

The issue with switching clusters mainly concerns the numerous connections clients have with PD and TiKV. If all connections are interrupted and then reconnected to PD and TiKV, the impact is uncertain, such as cache inconsistencies affecting the upper layers. I wanted to see if anyone has done something similar~ :joy:

TiKV itself is highly available, and we have deployed it across three data centers. However, we have still encountered failures, such as a surge in requests maxing out TiKV’s CPU. Even adding two or three high-performance machines didn’t help, and we had to identify and block abnormal IPs one by one to isolate the issue, which took a long time. Having a backup cluster would be much better, so we still hope to take a step further.

| username: xfworld | Original post link

I understand the scenario you’re describing. Generally, when resources go offline or become unavailable, other services still have cached versions of these resources. As a result, requests continue to be directed to these unavailable resources, leading to request backlogs and failed responses (which severely impacts user experience).

I estimate that TiDB could update some APIs to provide guidance on usage, which could solve these issues (supporting manual cache updates).

In fact, you could start a new thread detailing your requirements and describing the product features you would like to see updated.

| username: 夜-NULL | Original post link

Okay, thx~~

| username: system | Original post link

This topic was automatically closed 1 minute after the last reply. No new replies are allowed.