Flink Writing Data via tikvClient is Too Slow

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Flink 通过tikvClient 写数据太慢

| username: 表渣渣渣

[TiDB Usage Environment] Testing
[TiDB Version] v5.4.0
[Reproduction Path] Slow data writing through Flink + TiKV Client
[Encountered Problem: Phenomenon and Impact] Currently encountering a bottleneck in TiDB writing, trying to optimize writing through TiKV two-phase commit

Using Flink Stream API to develop a custom Sink:

Two-phase commit code for reference:

public void KVSet(TiSession session, @Nonnull List<BytePairWrapper> pairs) {
    System.out.println("Final put .size() = " + pairs.size());
    Iterator<BytePairWrapper> iterator = pairs.iterator();
    if (!iterator.hasNext()) {
        return;
    }

    twoPhaseCommitter = new TwoPhaseCommitter(session, session.getTimestamp().getVersion());
    BytePairWrapper primaryPair = iterator.next();

    try {
        twoPhaseCommitter.prewritePrimaryKey(
                ConcreteBackOffer.newCustomBackOff(2000),
                primaryPair.getKey(),
                primaryPair.getValue());

        if (iterator.hasNext()) {
            twoPhaseCommitter.prewriteSecondaryKeys(
                    primaryPair.getKey(), iterator, 2000);
        }

        twoPhaseCommitter.commitPrimaryKey(
                ConcreteBackOffer.newCustomBackOff(1000),
                primaryPair.getKey(),
                session.getTimestamp().getVersion());
        System.out.println("Two-phase commit submitted here");

    } catch (Throwable ignored) {
        System.out.println("Exception occurred here");
        ignored.printStackTrace();
    } finally {
        try {
            twoPhaseCommitter.close();
        } catch (Exception e) {
            System.out.println("Another exception occurred here");
            e.printStackTrace();
        }
    }
}

Data records after writing test:

Count Time (s)
14127 114
255 2
1 0
13623 97
330 2
1 0
27644 194
441 3
1 0
26576 201
7309 56
511 4
1 0
25397 205
156 1
1 0
13105 98
471 3
1 0
1152 9
25517 206
359 2
1 0
54 0
20234 156
574 4
1 0
19090 153
39 0
6347 46
464 3
1 0
16867 129
280 2
1 0
23832 179
529 4
1 0
30000 228
6101 49
212 1
1 0
22112 180
1 0
20835 167
331 2
1 0
23988 193
192 1
1 0
6216 50
105 0
1 0
24433 197
68 0
1619 13
35 0
18363 148
415 3
1 0
3944 31
248 1
1 0
2351 18
411 3
1 0
17074 124
419 3
1 0
1780 14
290 2
1 0
8375 67
35 0
1 0
23907 194
296 2
1 0
23032 182
400 3
1 0
10695 86
557 4
1 0
8764 57
488 3
1 0
1451 11
279 2
1 0
15296 107
306 2
1 0
7985 57
556 4
1 0
19561 143
230 1
1 0
11910 91
6 0
59 0
17334 131
232 1
1 0
10884 76
133 0
1 0
29803 230
64 0
1 0
27620 209
283 2
1 0
26211 196
288 2
1 0
609 4
72 0
1 0
13061 100
161 1
1 0
8906 72
396 3
1 0
9419 74
346 2
1 0
12576 101
227 2
1 0
1087 8
6608 53
2 0
11010 88
473 3
1 0
18498 150
593 4
1 0
4146 33
456 3
1 0
16378 126
394 3
1 0
1547 12
45 0
13211 101
226 1
1 0
25327 191
420 3
1 0
25971 196
91 0
1 0
16258 120
48 0
2 0
20119 153
213 1
1 0
1451 11
1 0
20958 159
297 2
2 0
14308 109
39 0
2852 22
447 3
1 0
2269 17
368 2
1 0
16736 126
130 1
1 0
18719 150
71 0
1 0
16420 132
476 3
2 0

Average: 130.2 records/s

Write settings: Maximum batch size 30,000 records, maximum batch time 1s

I also tested with a maximum batch size of 3,000 records and a maximum batch time of 1s, and the results were similar.

My writing method is a single task, single-threaded writing, which is far inferior to JDBC, very disappointing :face_holding_back_tears:

Here is a summary of the TiDB Column.type and Java data type conversion comparison:

Final question: When we write data through TiDB JDBC, is it implemented at the underlying level using TiKV kvClient put/get/delete? Is my writing method in need of optimization?

| username: 表渣渣渣 | Original post link

Oh, by the way, I didn’t use tikv rawClient. I couldn’t find any information on encode & decode for rawClient.

| username: Billmay表妹 | Original post link

Please note the following points:

  1. Ensure that your TiKV cluster is correctly configured and deployed, and that your Java client is properly connected to the TiKV cluster.
  2. Make sure that the version of the TiKV Java client used in your code is compatible with your TiKV cluster version. You can check the version compatibility information in the TiKV Java client’s GitHub repository.
  3. Be aware that using two-phase commit may have some impact on performance. In actual use, you need to test and optimize according to the specific situation.

Additionally, if you encounter TiDB write performance issues, you can first check the TiDB-related dashboards, such as the TiKV-FastTune dashboard. If the issue is on the storage side, you can view the potential causes of performance problems in the TiKV-FastTune dashboard and check the corresponding charts.

| username: Billmay表妹 | Original post link

What is your resource configuration like?

| username: 表渣渣渣 | Original post link

Test environment: 16GB memory, four cores, three machines, deploying TiDB 3, TiKV 3, TiPD 3, Flink local deployment, 16GB + 12 cores, but using single parallelism for write operations.

| username: 表渣渣渣 | Original post link

I haven’t studied the TiKV-FastTune dashboard yet, need to take a look.

| username: 表渣渣渣 | Original post link

It is possible to optimize Flink writing to TiDB by creating a flink-connect-tidb similar to Doris.

| username: 表渣渣渣 | Original post link

I suddenly thought that Flink operates TiKV in a single-threaded manner, using local resources, but JDBC does not. It will call the cluster’s resources to write. Thus, the JDBC write speed is a multiple of the total number of threads in the cluster compared to operating TiKV individually. This explanation seems to make more sense.