Livy TiSpark Reading TiDB Data: Cannot Find Catalog Plugin Class for Catalog 'tidb_catalog': org.apache.spark.sql.catalyst.catalog.TiCatalog

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: livy tispark读取tidb数据Cannot find catalog plugin class for catalog ‘tidb_catalog’: org.apache.spark.sql.catalyst.catalog.TiCatalog

| username: TiDBer_9ASYe5gK

  1. We are using Livy to read TiDB data, and it prompts “Error: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.spark.SparkException: Cannot find catalog plugin class for catalog ‘tidb_catalog’: org.apache.spark.sql.catalyst.catalog.TiCatalog”
    Related configuration: The Spark version used is spark-3.0.2-bin-hadoop2.7, and the TiSpark version is tispark-assembly-3.0_2.12-3.0.2.jar

  2. Configuration
    spark.sql.catalog.tidb_catalog org.apache.spark.sql.catalyst.catalog.TiCatalog
    spark.sql.catalog.tidb_catalog.pd.addresses x
    spark.sql.extensions org.apache.spark.sql.TiExtensions
    spark.tispark.pd.addresses x

  3. Using Livy JDBC connection: ./beeline -u 'jdbc:hive2://dx-pipe-pt277-pm:10001,“show databases;” works normally,
    but using catalog, such as: “use tidb_catalog.tidb_conan_rock_mission”
    prompts the following error:
    0: jdbc:hive2://dx-pipe-pt277-pm:10001> use tidb_catalog.tidb_conan_rock_mission;

Error: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.spark.SparkException: Cannot find catalog plugin class for catalog ‘tidb_catalog’: org.apache.spark.sql.catalyst.catalog.TiCatalog

at org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:66)

at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$catalog$1(CatalogManager.scala:52)

at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)

at org.apache.spark.sql.connector.catalog.CatalogManager.catalog(CatalogManager.scala:52)

at org.apache.spark.sql.connector.catalog.LookupCatalog$CatalogAndNamespace$.unapply(LookupCatalog.scala:92)

at org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:209)

at org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:34)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)

at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)

at org.apache.spark.sql.catalyst.analysis.ResolveCatalogs.apply(ResolveCatalogs.scala:34)

at org.apache.spark.sql.catalyst.analysis.ResolveCatalogs.apply(ResolveCatalogs.scala:29)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)

at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)

at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)

at scala.collection.immutable.List.foldLeft(List.scala:91)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)

at scala.collection.immutable.List.foreach(List.scala:431)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)

at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:196)

at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:190)

at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:183)

at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:183)

at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:174)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228)

at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:173)

at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73)

at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)

at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)

at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)

at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:73)

at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)

at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)

at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)

at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)

at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)

at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)

at org.apache.livy.thriftserver.session.SqlJob.executeSql(SqlJob.java:72)

at org.apache.livy.thriftserver.session.SqlJob.call(SqlJob.java:62)

at org.apache.livy.thriftserver.session.SqlJob.call(SqlJob.java:33)

at org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:64)

at org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:31)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748) (state=,code=0)

0: jdbc:hive2://dx-pipe-pt277-pm:10001>

| username: ShawnYan | Original post link

Is it normal to use it like this, or will it also report an error?

| username: TiDBer_9ASYe5gK | Original post link

This will cause an error.

| username: TiDBer_9ASYe5gK | Original post link

0: jdbc:hive2://dx-pipe-pt277-pm:10001> select * from tidb_catalog.tidb_conan_rock_mission.user_mission limit 1;

Error: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.spark.SparkException: Cannot find catalog plugin class for catalog ‘tidb_catalog’: org.apache.spark.sql.catalyst.catalog.TiCatalog

at org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:66)

at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$catalog$1(CatalogManager.scala:52)

at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)

at org.apache.spark.sql.connector.catalog.CatalogManager.catalog(CatalogManager.scala:52)

at org.apache.spark.sql.connector.catalog.LookupCatalog$CatalogAndIdentifier$.unapply(LookupCatalog.scala:128)

at org.apache.spark.sql.connector.catalog.LookupCatalog$SessionCatalogAndIdentifier$.unapply(LookupCatalog.scala:63)

at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1172)

at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$10.applyOrElse(Analyzer.scala:1135)

at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$10.applyOrElse(Analyzer.scala:1102)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)

at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:90)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$2(AnalysisHelper.scala:87)

at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:87)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$2(AnalysisHelper.scala:87)

at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:87)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$2(AnalysisHelper.scala:87)

at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:87)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)

at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)

at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1102)

at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1070)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)

at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)

at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)

at scala.collection.immutable.List.foldLeft(List.scala:91)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)

at scala.collection.immutable.List.foreach(List.scala:431)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)

at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:196)

at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:190)

at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:183)

at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:183)

at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:174)

at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228)

at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:173)

at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73)

at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)

at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)

at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)

at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:73)

at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)

at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)

at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)

at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)

at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)

at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)

at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)

at org.apache.livy.thriftserver.session.SqlJob.executeSql(SqlJob.java:72)

at org.apache.livy.thriftserver.session.SqlJob.call(SqlJob.java:62)

at org.apache.livy.thriftserver.session.SqlJob.call(SqlJob.java:33)

at org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:64)

at org.apache.livy.rsc.driver.JobWrapper.call(JobWrapper.java:31)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748) (state=,code=0)

0: jdbc:hive2://dx-pipe-pt277-pm:10001>

| username: TiDBer_9ASYe5gK | Original post link

Can I add you on WeChat?

| username: shiyuhang0 | Original post link

This error seems somewhat similar. Try placing it under the lib directory to see if it resolves the issue? How to run Spark SQL Thrift Server in local mode and connect to Delta using JDBC - Stack Overflow

| username: TiDBer_9ASYe5gK | Original post link

Not resolved.

| username: TiDBer_9ASYe5gK | Original post link

0: jdbc:hive2://dx-pipe-ptxxx-pm:10001> use tidb_catalog.tidb_conan_rock_mission;

Error: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.spark.SparkException: Cannot find catalog plugin class for catalog ‘tidb_catalog’: org.apache.spark.sql.catalyst.catalog.TiCatalog

| username: TiDBer_9ASYe5gK | Original post link

Can any expert help solve this? Our Spark2 is fine.

| username: jansu-dev | Original post link

I don’t quite understand Spark and TiSpark.

  1. However, it seems that the master branch has fixed some issues related to the catalog. You can try using the TiSpark master branch.
  2. If the issue still isn’t resolved, you can ask the TiSpark maintainers on GitHub: Issues · pingcap/tispark · GitHub