Error When Connecting EMR Spark to TiDB Cloud Instance

This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: emr spark连接 TiDB Cloud 的 instance 报错

| username: TiDBer_xue91f7A

[TiDB Usage Environment] /Test/ Poc
[TiDB Version] 6.6.0
[Reproduction Path] Operations performed that led to the issue
AWS EMR Spark connecting to TiDB Cloud instance, following the instructions at
spark.sql.extensions org.apache.spark.sql.TiExtensions
spark.tispark.pd.addresses ${your_pd_address}
spark.sql.catalog.tidb_catalog org.apache.spark.sql.catalyst.catalog.TiCatalog
spark.sql.catalog.tidb_catalog.pd.addresses ${your_pd_address}
spark-shell --jars tispark-assembly-{version}.jar
[Encountered Issue: Issue Phenomenon and Impact]
Error PDClient: failed to get member from PD server
[Resource Configuration] Go to TiDB Dashboard - Cluster Info - Hosts and take a screenshot of this page

[Attachments: Screenshots/Logs/Monitoring]

| username: srstack | Original post link

Can’t log in using mysql -h{host} -P{port} -u{username} -p{password} either?
The error looks like a serverless cluster issue.

I created a serverless cluster in the Singapore region myself and tried it, and it can be connected now.
How about retrying it?

| username: srstack | Original post link

TiSpark :face_with_open_eyes_and_hand_over_mouth: serverless probably doesn’t support being used as the backend for Spark.

| username: TiDBer_xue91f7A | Original post link

I am using Spark to connect. Does TiSpark’s serverless not support Spark?

| username: Kongdom | Original post link

Is the MySQL client connection normal?

| username: Jellybean | Original post link

Please check if the cluster machines submitting jobs in Spark can access the TiDB cluster network normally.

| username: TiDBer_xue91f7A | Original post link

On the node of the EMR cluster, it is possible to connect to the TiDB Cloud instance via MySQL.

However, in the Spark shell on the EMR cluster, after loading the required parameters, it reports the error “PDClient: failed to get member from pd server.”

I noticed that the connection information provided by the TiDB Cloud instance does not include PD-related information. Does this mean that the PD of the TiDB Cloud instance is not exposed externally?

| username: srstack | Original post link

Yes, serverless only exposes the TiDB connection address, so it should not support using TiSpark.

| username: dba远航 | Original post link

This connection to PD failed.

| username: ShawnYan | Original post link

TiSpark relies on PD, and it seems that TiDB Cloud cannot obtain the PD address.

| username: system | Original post link

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.