Error Writing with TiSpark via Datasource API: org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiSpark通过datasource api写入,报错org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED

| username: TiDBer_hJ6mZgS4

The jar package for TiSpark is tispark-assembly-3.1_2.12-3.3.0.jar
Spark version 3.1.1
Using the datasource API to write to the table, the code is as follows:

df.write().format("tidb")
        .option("database", database)
        .option("table", table)
        .option("replace", "true")
        .mode("append")
        .save();

For some tables, there is no problem with writing, but for some tables, an error occurs:

org.tikv.common.exception.TiBatchWriteException: Execution exception met.
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeys(TwoPhaseCommitter.java:308)
	at org.tikv.txn.TwoPhaseCommitter.prewriteSecondaryKeys(TwoPhaseCommitter.java:259)
	at com.pingcap.tispark.utils.TwoPhaseCommitHepler.$anonfun$prewriteSecondaryKeyByExecutors$1(TwoPhaseCommitHepler.scala:102)
	at com.pingcap.tispark.utils.TwoPhaseCommitHepler.$anonfun$prewriteSecondaryKeyByExecutors$1$adapted(TwoPhaseCommitHepler.scala:90)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.concurrent.ExecutionException: org.tikv.common.exception.TiBatchWriteException: prewrite secondary key error
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeys(TwoPhaseCommitter.java:285)
	... 14 more
Caused by: org.tikv.common.exception.TiBatchWriteException: prewrite secondary key error
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeySingleBatchWithRetry(TwoPhaseCommitter.java:426)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:357)
	at org.tikv.txn.TwoPhaseCommitter.retryPrewriteBatch(TwoPhaseCommitter.java:390)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeySingleBatchWithRetry(TwoPhaseCommitter.java:439)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:357)
	at org.tikv.txn.TwoPhaseCommitter.lambda$doPrewriteSecondaryKeys$0(TwoPhaseCommitter.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: org.tikv.common.exception.GrpcException: org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED
	at org.tikv.common.policy.RetryPolicy.rethrowNotRecoverableException(RetryPolicy.java:70)
	at org.tikv.common.policy.RetryPolicy.callWithRetry(RetryPolicy.java:94)
	at org.tikv.common.AbstractGRPCClient.callWithRetry(AbstractGRPCClient.java:88)
	at org.tikv.common.region.RegionStoreClient.prewrite(RegionStoreClient.java:486)
	at org.tikv.common.region.RegionStoreClient.prewrite(RegionStoreClient.java:435)
	at org.tikv.txn.TxnKVClient.prewrite(TxnKVClient.java:104)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeySingleBatchWithRetry(TwoPhaseCommitter.java:416)
	... 11 more
Caused by: org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED
	at org.tikv.shade.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:287)
	at org.tikv.shade.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:268)
	at org.tikv.shade.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:175)
	at org.tikv.common.AbstractGRPCClient.lambda$callWithRetry$0(AbstractGRPCClient.java:91)
	at org.tikv.common.policy.RetryPolicy.callWithRetry(RetryPolicy.java:88)
	... 16 more

How should I troubleshoot this issue?

| username: xfworld | Original post link

Compare the differences in table structures to see if simpler structured tables are more likely to be written normally.

| username: TiDBer_hJ6mZgS4 | Original post link

Does a simple table structure refer to having a small number of fields?

| username: xfworld | Original post link

No, the error that appears in the logs is quite rare
prewrite secondary key error

So, which ones can succeed? Which ones cannot succeed? You need to provide them and compare.

| username: TiDBer_hJ6mZgS4 | Original post link

We have multiple tasks, all of which execute SQL and write the results into their respective TiDB tables. Currently, the number of fields in each table varies from 20 to 80. We have observed that the tables that report errors do not always report errors; sometimes retrying a few times will succeed. However, recently the frequency of errors and retries has increased, and yesterday there were two tables that consistently reported errors and could not pass even after retries.

Additionally, I would like to ask if we should start from the root cause that leads to the prewrite secondary key error:
org.tikv.common.region.RegionStoreClient.prewrite(RegionStoreClient.java:486)

org.tikv.shade.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:175)
An exception occurs here:
org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED
Could this be related to the TiKV server status or some metrics?

| username: xfworld | Original post link

The prewrite secondary key error means the write operation completely failed.

Generally, writing involves two steps:
primary key write
secondary key write

to satisfy the control and management of the distributed transaction commit process. If the transaction fails, it will roll back, and there won’t be an error feedback…
It seems like either the region has an issue or the TiKV node has a problem. You’ll need to investigate this yourself.

| username: TiDBer_hJ6mZgS4 | Original post link

The 2PC implementation of transaction writing, where a secondary key prewrite exception causes the data to not be written successfully, does not affect existing data, which is fine.

Are there any good ways to troubleshoot if there are exceptions in the region or TiKV?
I didn’t see any UNIMPLEMENTED related exception information in the TiKV logs.

| username: xfworld | Original post link

You can check through the cluster’s monitoring service, Grafana.

There are many metrics, and simple issues are easy to spot except for very complex performance problems.

But, no error logs in TiKV? Could it be due to poor compatibility of TiSpark with version 7.1.X?

I guess we need to check the update information on TiSpark’s GitHub to find out. Recently, GitHub has been inaccessible…

| username: TiDBer_hJ6mZgS4 | Original post link

Thank you, it might be related to compatibility with version 7.1.
Also, the TiSpark Git repository hasn’t been updated for a long time…

| username: xfworld | Original post link

Dizzy, if you want to use Spark, I suggest using version 6.5.X.

| username: TiDBer_hJ6mZgS4 | Original post link

Feedback: The error is caused by writing to a table that has a TiFlash replica.

The TiSpark datasource API uses org.tikv.txn.TwoPhaseCommitter from client-java for writing,
where the doPrewriteSecondaryKeysInBatchesWithRetry method, during the secondary key prewrite phase, groups the keys by region,
and writes the grouped data to each store corresponding to the region.
Since a TiFlash replica is configured, it attempts to write to the TiFlash store, but TiFlash does not support prewrite, resulting in a grpc request error org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED

Verification as follows:

The solution is to skip writing if store.isTiFlash() is true.