Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiSpark不支持auto random做主键的表批量写入
I want to use TiSpark for ETL and batch writing after a large amount of computation, but I found that TiSpark does not support batch writing to tables with Auto Random id as the primary key. Will this feature be updated or will it not be supported?
What version of TiDB is it? Is the table structure using clustered indexes?
What versions are TiSpark and Spark respectively?
And what is the process of data operations?
TiDB: 5.2.1
Table structure:
create table xxx
(
id bigint PRIMARY KEY AUTO_RANDOM(8),
uniqueKey varchar(256) null,
xxxx....
constraint idx_unique_key
unique (uniqueKey)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
TiSpark: 3.1-2.5.1
Spark: 3.1.3
When TiSpark writes, it repartitions the RDD based on the estimated number of regions and then writes directly to TiKV concurrently. In this scenario, writing to an Auto_Random column should have issues. As shown in the figure, this is a check in the TiSpark code before writing to the table. If the target table has an Auto_Random column, the above information will appear. This explanation is not too technical; I will consult other experts to strive for a fundamental explanation.
Upon investigation, TiSpark 3.0.1 already supports the aforementioned auto_random. You can test it out.
Reference: tispark/CHANGELOG.md at 4e0860ad2d7dd46c5af6e2486197bceff863b183 · pingcap/tispark · GitHub
Isn’t this a new feature of TiSpark 2.3.11?
I looked at this change, and it should throw an error indicating that writing to the auto_random column is not supported.
Okay, thank you. I’ll check it.
Supports auto_random but does not support writing.
The statement is when submitting a spark-job, not SQL. The auto random column is not specified, and the statement is similar to:
dataframe.write()
...
The dataframe does not have the auto random column. So, does it mean that currently, it does not support batch writing data with auto random columns using Spark? The provided link is for TiDB and does not go through TiSpark.
Confirmed, it does not support writing, it only supports reading.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.