Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: TiDB 5.0.4 MySQL查询与TiDB结果不一致
To improve efficiency, please provide the following information. A clear problem description can help resolve the issue faster:
[Overview]
With consistent data between MySQL and TiDB, the same SQL query returns different results in MySQL and TiDB.
[Phenomenon]
select git_mr.I_MR_ID
from table1 join table2 on table1.I_MR_ID = table2.I_MR_ID
where
1=1
and table2.D_UPDATED_AT >= '2023-06-01' and table2.D_UPDATED_AT <= '2023-06-11'
and table1.CH_USER_NAME != table2.CH_OPEN_USER_NAME
and CH_NOTE_TYPE="statsxxx"
and CH_USER_NAME not in ('aaa', 'bbbb.fe')
and CH_NOTE like 'cccc' -- After removing this condition, the query results are consistent.
and CH_STATE = "xxxxx"
and table2.CH_GROUP NOT LIKE "%.%"
group by table2.I_MR_ID;
[TiDB Version]
-
TiDB version: 5.0.4
-
MySQL version: 5.7.32
-
Additional note: The preliminary diagnosis is that case-insensitive
is causing the issue.
After removing one of the where conditions, CH_NOTE like 'cccc'
, the query results in MySQL and TiDB are consistent.
Is this real-time data or stock data? First, determine if the data is being updated.
If the data in MySQL and TiDB is consistent, but the same SQL returns different results in MySQL and TiDB, it may be due to the following reasons:
-
SQL Syntax Incompatibility: There are some differences in SQL syntax between MySQL and TiDB. For example, TiDB does not support some MySQL syntax, or TiDB’s implementation of the syntax is different from MySQL. If the SQL syntax is incompatible, it may lead to different results when executing the same SQL statement in MySQL and TiDB.
-
Data Type Incompatibility: There are some differences in data types between MySQL and TiDB. For example, TiDB supports more data types, or TiDB’s implementation of data types is different from MySQL. If incompatible data types are used in the SQL statement, it may lead to different results when executing the same SQL statement in MySQL and TiDB.
-
Database Parameter Inconsistency: There are some differences in database parameters between MySQL and TiDB. For example, TiDB’s default character set and collation are different from MySQL, or TiDB’s default isolation level is different from MySQL. If the database parameters are inconsistent, it may lead to different results when executing the same SQL statement in MySQL and TiDB.
-
Database Data Inconsistency: Although the data in MySQL and TiDB is consistent, there may be cases of data inconsistency in some situations. For example, there may be delays in data synchronization between MySQL and TiDB, or errors may occur during the data synchronization process. If the database data is inconsistent, it may lead to different results when executing the same SQL statement in MySQL and TiDB.
To determine the specific reason, you can troubleshoot step by step. First, check if the SQL syntax and data types are compatible, then check if the database parameters are consistent, and finally check if the database data is consistent. If you cannot determine the reason, you can provide more information, such as SQL statements, table structure, data types, database parameters, etc., to better help you solve the problem.
DM real-time synchronization to TiDB, verified multiple times, data is consistent.
Could it be that the SQL execution client set autocommit=0, causing it to query snapshot data?
It is recommended to check what characteristics those missing rows have.
I found that the collations of TiDB and MySQL are inconsistent. (The default 5.0 version is incompatible with some MySQL character sets and collations)
Follow-up question: If using the rolling upgrade method, will the parameters be adjusted?
If the cluster uses the master-slave switch upgrade method, does the new_collation_enabled
parameter need to be adjusted?
The case-insensitive
issue is related to case sensitivity, right? Does it have anything to do with like
?
Currently, it appears that CH_NOTE like 'cccc'
and CH_NOTE like 'CCCC'
yield inconsistent results after changing the case.
Conclusion feedback:
When new_collations_enabled_on_first_bootstrap
= false, TiDB’s utf8mb4 character set and collation set are utf8mb4
and utf8mb4_bin
respectively. The utf8mb4_bin
collation set is case-sensitive.