Compatibility Issues Between TiSpark and TiDB Versions

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: TiSpark跟TiDB版本兼容性问题

| username: TiDBer_bHyusvtE

[TiDB Usage Environment] Production Environment
[TiDB Version] 7.1.3
[Spark Version] 3.1.3
[TiSpark Version] Still using the old tispark-assembly-3.1-2.5.2.jar
[Encountered Problem: Problem Phenomenon and Impact] After upgrading TiDB to version 7, unable to connect to TiDB using PySpark
[Problem Screenshot] The official website does not provide the corresponding TiSpark version for TiDB 7.x, how to solve this?

| username: tidb狂热爱好者 | Original post link

If you use TiSpark, upgrade the version to 6.5.

| username: ShawnYan | Original post link

Was it working well before? Is the PD information configured correctly?

Besides, just use TiKV/TiFlash directly.

| username: Demo二棉裤 | Original post link

TiSpark is no longer maintained, but using the latest TiSpark 3.1.x can still extract data from databases version 7.1 and above. However, it is recommended to find an alternative solution since TiSpark is no longer maintained.

| username: Demo二棉裤 | Original post link

Additionally, based on the error message, did you write the correct address for your PD, and is the network policy working?

| username: WalterWj | Original post link

This error seems unrelated to the version; it feels like pd spark cannot connect.

| username: Jellybean | Original post link

TiSpark is a thin layer for Spark to access the TiKV cluster. When Spark starts, it registers the cluster’s PD address through TiSpark to obtain metadata information such as the cluster’s databases and tables.

Based on the error message posted by the original poster, Spark is unable to register with PD when it starts. Therefore, you can troubleshoot and confirm the following:

  1. Is the current cluster accessible?
  2. Are the IP address and port of PD normal?
  3. Is the network access from the YARN cluster, including Spark driver and other Spark cluster machines, to the target TiDB cluster normal?

Check these first.

| username: Billmay表妹 | Original post link

Try using the latest 3.2.x version.

| username: juecong | Original post link

TiDB version 7.1.1, used in a production environment with Spark 3.3.x and 3.1.x. TiSpark uses tispark-assembly-3.3_2.12-3.1.3.jar. It is currently in normal use without any issues. I suggest you check the PD connection problem.

| username: yytest | Original post link

Let’s try upgrading the version, upgrading might solve the bug.

| username: TiDBer_QYr0vohO | Original post link

Please check this out:

| username: TiDBer_HUfcQIJx | Original post link

Try upgrading the version.

| username: TiDBer_rvITcue9 | Original post link

Try upgrading the version.

| username: yytest | Original post link

  • Check TiSpark Version: Ensure that the TiSpark version you are using is compatible with TiDB 7.0. Typically, the TiDB official website will provide information on TiSpark versions that are compatible with the new version of TiDB.
  • Update Dependencies: If you are using an older version of TiSpark, you may need to update to a version that is compatible with TiDB 7.0. This may involve uninstalling the old version of TiSpark and installing the new version.
  • Check Configuration Parameters: Check your PySpark configuration parameters to ensure they meet the requirements of TiDB 7.0. Specifically, the spark.tispark.pd.addresses parameter should point to the address of the TiDB PD (Placement Driver).