Memory Leak in tikv-go Client

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv-go客户端内存泄露

| username: chenbin200818

[Test Environment for TiDB]
[client-go Version] v2.0.7
[Reproduction Path]

  • Connect to the TiKV service in the client process, then bring down the network interface card (NIC).

[Encountered Issue: Symptoms and Impact]

  • The memory of the client process keeps increasing, rising by 800MB in a day.
  • After the NIC is back online, the memory is not released.

[Other Analysis]

  • Logs show that the tikv-go client keeps reconnecting to PD.

[2023/12/01 14:45:10.483 +08:00] [WARN] [pd_service_discovery.go:400] [“[pd] failed to get cluster info for the leader”] [leader-addr=http://10.10.10.31:2379] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.10.10.31:2379 status:CONNECTING: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.10.10.31:2379 status:CONNECTING”]
[2023/12/01 14:45:10.523 +08:00] [WARN] [pd_service_discovery.go:400] [“[pd] failed to get cluster info for the leader”] [leader-addr=http://10.10.10.31:2379] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.10.10.31:2379: i/o timeout" target:10.10.10.31:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.10.10.31:2379: i/o timeout" target:10.10.10.31:2379 status:TRANSIENT_FAILURE”]

| username: Billmay表妹 | Original post link

Troubleshooting steps:

  1. Check client code: Inspect the client code, especially the parts related to connecting to TiKV and PD. Ensure there are no memory leaks in the code, such as unclosed connections or unreleased resources. Use memory analysis tools (e.g., pprof, heapster) to check memory usage and leaks.
  2. Check TiKV Go client configuration: Review the configuration parameters of the TiKV Go client to ensure they are set correctly. Pay special attention to parameters related to connecting to PD, such as pd.endpoints, pd.request-timeout, etc.
  3. Check PD status: Use TiUP or PD-CTL tools to check the status of the PD cluster, ensuring all nodes in the PD cluster are running normally and there are no abnormal Leader distributions. You can use commands like tiup ctl:v5.1.1 pd -u <pd_address> store and tiup ctl:v5.1.1 pd -u <pd_address> leader to view the status information of the PD cluster.
  4. Check network connection: Verify that the network connection between the client process and the TiKV service is normal. Ensure the network connection is stable, with no packet loss or excessive latency. Use the ping command or other network diagnostic tools to test connectivity and latency.
| username: chenbin200818 | Original post link

Just one client instance, one connection, nothing else.

| username: chenbin200818 | Original post link

// The connection code is as follows, see if there are any issues

func newClient(pdAddrs string) (*tikvClient, error) {

    pdOpt := rawkv.WithPDOptions(pd.WithMaxErrorRetry(2))
    secureOpt := rawkv.WithSecurity(config.DefaultConfig().Security)
    cli, err := rawkv.NewClientWithOpts(context.TODO(), pdAddrs, pdOpt, secureOpt)

    if err != nil {
        fmt.Println("Create tikv client failure, ", now(), err)
        return nil, err
    }

    client := new(tikvClient)
    client.cli = cli

    return client, nil

}

| username: 芮芮是产品 | Original post link

As long as the network card doesn’t disconnect, it’s fine.

| username: 芮芮是产品 | Original post link

Isn’t this your own code?

| username: Billmay表妹 | Original post link

It is recommended not to use TiKV alone because there is basically no one in the community who can help you~
If you can use TiDB, then use TiDB~

| username: chenbin200818 | Original post link

I called the tikv-go code.

| username: andone | Original post link

I don’t know the answer to this question, but I’ll bump it up for you.

| username: dba远航 | Original post link

Network communication is experiencing an anomaly.

| username: wangkk2024 | Original post link

Will a restart fix this?

| username: stephanie | Original post link

Did your client code close the connection promptly after completing it?