Using tikv go-client at 1-second intervals will double the transaction request latency

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tikv go-client间隔1秒使用,事务请求的时延会翻倍

| username: jwongz

[TiKV Usage Environment] Test
[TiKV Version]
tikv-server: v6.5.8
go-client: v2.0.7

[Encountered Problem: Phenomenon and Impact]

  1. Using optimistic transactions to repeatedly put the same key, with a value size of 64kb, and the value changes with each put.
  2. Loop put 10,000 times, average latency is 2.3 milliseconds.
  3. Loop put 10,000 times with a 1-second sleep in between, average latency is 4.5 milliseconds.

The business requests are not frequent, but the latency of each request is very sensitive, causing performance jitter in the business.

Source code, removing Lock Key can optimize by 500 microseconds:

func lockPuts(keys [][]byte, values [][]byte) error {
        tx, err := client.Begin()
        if (err != nil) {
                return err
        }
        tx.SetPessimistic(true)
        err = tx.LockKeysWithWaitTime(context.Background(), kv.LockNoWait, keys...)
        if (err != nil) {
                return err
        }

        for i := 0; i < len(keys); i++ {
                key, val := keys[i], values[i]
                err := tx.Set(key, val)
                if (err != nil) {
                        tx.Rollback()
                        return err
                }
        }

        tx.Commit(context.Background())
        return nil
}

var total int64 = 0
for i := 0; i < 10000; i++ {
        start := time.Now()
        err := lockPut(xxx, xxx)
        if (err != nil) {
                panic(err)
        }
        end := time.Since(start)
        fmt.Println("cost:", end, "\n")
        total += end.Microseconds()
        time.Sleep(1 * time.Second)
}

fmt.Println("avg:", total / int64(*count))

[Resource Configuration]
NVMe disk, 3 TiKV nodes deployed with default configuration using tiup

| username: TiDBer_jYQINSnf | Original post link

What is the difference between two loops of put?

| username: jwongz | Original post link

time.sleep(1)

| username: TiDBer_jYQINSnf | Original post link

Does this have a commit?

| username: jwongz | Original post link

Some commits

| username: TiDBer_jYQINSnf | Original post link

10,000 put\commit like this? The first round is all 2.3 milliseconds, the second round is all 4.5 milliseconds? Sequential execution? No transaction conflict fallback either?
If the cluster has no other requests, purely for testing, you should increase the sleep interval and the number of puts, so that the monitoring can collect data, and then check the monitoring.

| username: jwongz | Original post link

I posted the code, and it is all executed sequentially.

| username: zhaokede | Original post link

There should be more sets of tests for comparison.

| username: dba-kit | Original post link

Have you conducted P99 and P95 statistics? Are there a few requests that are particularly long? Could you share the logs that were generated so that everyone can analyze them?

| username: TiDBer_jYQINSnf | Original post link

Run the loop multiple times, and if you continue to divide, separately time the locking, put, and commit operations in the lockkey to see where the main time differences lie for each group. I understand that the put operation should not take much time; the difference might be in the lock and commit. You are using pessimistic transactions, right? Not optimistic transactions.

| username: jwongz | Original post link

Will LockKey make this invalid? However, Lock Key seems to be a pessimistic transaction play.

| username: TiDBer_jYQINSnf | Original post link

This word should mean “pessimistic transaction.”

| username: zhaokede | Original post link

The difference in commit time between pessimistic transactions and optimistic transactions is not that significant, right?

| username: jwongz | Original post link

There is still some, optimistic transactions will optimize around 500 microseconds.

| username: jwongz | Original post link

If not sleeping, kv_prewrite will drop from 1.9 milliseconds to around 700 microseconds, and kv_commit will drop from 700 microseconds to around 400 microseconds.

The latency of appending logs has also decreased.

The write latency of the raft engine has decreased.

| username: pingyu | Original post link

It might be due to the batch size.
Try setting MaxBatchSize to 0.

import "github.com/tikv/client-go/v2/config"
...
...
config.UpdateGlobal(func(c *config.Config) {
	c.TiKVClient.MaxBatchSize = 0
})
| username: jwongz | Original post link

Not very effective.

| username: pingyu | Original post link

For the case without sleep 1 second in the middle, setting MaxBatchSize = 0 still results in 700 us, meaning the effect is not significant?

| username: jwongz | Original post link

Well, whether or not sleep is added, the latency is not reduced. It doesn’t seem to be the issue.

| username: jwongz | Original post link

Locking and putting are both very stable, with the former within 10 microseconds and the latter within 40 microseconds. However, the commit time increases from over 1100 microseconds to over 2200 microseconds.