Issue of Data Inconsistency Between TiSpark Reading TiDB and TiDB Data Source

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: Tispark读取TiDB与TiDB数据源数据不一致问题

| username: 罗啰萝丶

[TiDB Usage Environment] Test
[TiDB Version] 7.0.1
[TiSpark Version] 3.1.5
[Reproduction Path] The number of tables in the database has reached 7000+, creating a new table in TiDB
spark-shell

  1. Use spark.sql(“use tidb_catalog.xx”) to select the database to count the number of tables
    Here some WARNINGS will be reported
    WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
    WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
    WARN Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
    WARN setMetaStoreSchemaVersion called but recording version is disabled
    WARN Failed to get database global_temp, returning NoSuchObjectException
  2. Use spark.sql(“show tables”).count()

tidb client

  1. show tables

[Encountered Problem: Problem Phenomenon and Impact] The number of tables directly read in TiDB and the number of tables read through spark-shell are inconsistent
As the number of tables in TiDB increases, the number of tables queried by spark sql decreases. This phenomenon occurs when the number of tables reaches a certain number.

-------------------Divider--------------------
Switching to jdbc connection to TiDB, the data is found to be consistent
So I suspect it is a TiSpark issue, is it due to some TiSpark configuration items?
[Resource Configuration]
[Attachment: Screenshot/Log/Monitoring]

| username: tidb狂热爱好者 | Original post link

Why not just use TiFlash?

| username: 有猫万事足 | Original post link

The warning says that the global_temp database cannot be found. Could it be that the tables under this database are missing?

Do you have any other log information?

| username: TiDBer_aKu9dgpb | Original post link

Ask the official or check the Git issue. We have also encountered data inconsistency problems.

| username: dba远航 | Original post link

There should be an issue with Spark support.

| username: DBAER | Original post link

Is TiSpark considered for special scenarios?

| username: 罗啰萝丶 | Original post link

It’s not under this library.

There is no other log information, mainly that the data read by TiSpark is inconsistent with the data in TiDB.

| username: zhaokede | Original post link

Is the data in the table inconsistent?

| username: 罗啰萝丶 | Original post link

The tables under the specified database are inconsistent, I haven’t checked the number of rows in the tables.

| username: WalterWj | Original post link

Subsequently, TiSpark is generally not recommended for use anymore.
The first warning is because it defaults to fetching data from Hive.

| username: oceanzhang | Original post link

This might be a bug.

| username: QH琉璃 | Original post link

Waiting for the expert’s results.

| username: TiDBer_JUi6UvZm | Original post link

7.0 is out, let’s use TiFlash.