TiDB SQL Query Data Incorrect, Conditions Not Filtering

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: tidb sql查询数据不正确,条件不过滤。

| username: leoones

[Overview] Scenario + Problem Summary
There is an issue with querying data in TiDB, as shown by the comparison of the following two SQL statements:
The first SQL statement can retrieve records where the id is 271535 and next_check_time is greater than 1664360160.

The correct query result is as follows:

The table structure is as follows:
CREATE TABLE temp_order_job (
id bigint(20) NOT NULL /*T![auto_rand] AUTO_RANDOM(4) */,
seller_id varchar(100) NOT NULL DEFAULT ‘’,
region varchar(20) NOT NULL DEFAULT ‘’ COMMENT ‘’,
marketplace_ids varchar(500) NOT NULL DEFAULT ‘’ COMMENT ‘’,
start_timestamp int(10) NOT NULL DEFAULT ‘0’ COMMENT ‘’,
end_timestamp int(10) NOT NULL DEFAULT ‘0’ COMMENT ‘’,
request_status int(10) NOT NULL DEFAULT ‘0’ COMMENT ‘’,
api_type varchar(20) NOT NULL DEFAULT ‘sp’ COMMENT ‘’,
gmt_modified datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘’,
gmt_create datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘’,
next_check_time int(10) NOT NULL COMMENT ‘’,
PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */,
KEY idx-request_status-next_check_time (request_status,next_check_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T![auto_rand_base] AUTO_RANDOM_BASE=1260001 */

[Business Impact]
Incorrect data query

[TiDB Version]
V5.4.2

| username: Kongdom | Original post link

It can’t be such a coincidence, have you done a BR restore?

You can test the first sentence, and also try adding the ID condition in the subquery.

| username: tracy0984 | Original post link

Are both of these query results reproducible? Could it be that the data was modified before and after the execution of the two SQL statements? This next_check_time column is a timestamp, so it should be meaningful, right? Could it be that the status of this row was modified at a certain time?

| username: Kongdom | Original post link

Seeing the name with “temp,” it can’t really be a temporary table that changes frequently, right?

| username: leoones | Original post link

The update is not that fast. I repeatedly checked the phenomenon yesterday and today, and it is the same.

| username: Kongdom | Original post link

Is this phenomenon still present if you add the ID condition in parentheses?

Have you ever performed a BR restore?

When this situation occurred before, excluding bugs and SQL writing issues, there were mainly two reasons:

  1. Restoring data to a non-empty database, causing data retrieval confusion.
  2. Other threads frequently updating table data.
| username: tracy0984 | Original post link

In the execution result of this SQL, can you see the value of request_status for the row where id=271535? Is it 1 or 2? Should we confirm which query condition is causing the request_status value to be different in the results of the two SQL queries above?

| username: tidb狂热爱好者 | Original post link

The one above has “order by id” while the one below does not have “order by”.

| username: leoones | Original post link

The result values are as follows

The correct values are as follows

| username: leoones | Original post link

I don’t quite understand. The data below already has id=271535, so why do we still need to order by? We need to verify why the values of the two SQL queries are different. Is this a bug or a usage issue?

| username: tracy0984 | Original post link

How about checking like this?

SELECT id, next_check_time, request_status 
FROM temp_order_job 
WHERE request_status = 1 
AND next_check_time < 1664360160 
AND id = 271535;

There shouldn’t be two databases with this table in TiDB, right? I noticed that the two SQL statements above didn’t specify the database name for the table…

| username: leoones | Original post link

In March, I used BR to migrate to this database. At that time, the target was a new cluster with no business data.

| username: leoones | Original post link

I specified the database for the query, and there is only this one table in the cluster…

| username: leoones | Original post link

You can’t find it because the correct value for id 271535 is request_status = 2.

| username: leoones | Original post link

The image is not visible. Please provide the text you need translated.

| username: tracy0984 | Original post link

Take a look at the query results, which one has a problem?

| username: leoones | Original post link

The results are as follows:

| username: tracy0984 | Original post link

Does this return any results?
If not, it looks like an issue with using limit and order by together…
select * from (select * from (select id, next_check_time, request_status From temp_order_job WHERE request_status = 1 AND next_check_time < 1664360160 order by id) t limit 50) tt where tt.id = 271535;
– You can consider this, should we execute the following SQL to see? Adjust the limit to query all data…
select * from (select * from (select id, next_check_time, request_status From temp_order_job WHERE request_status = 1 AND next_check_time < 1664360160 order by id) t limit 37000) tt where tt.id = 271535;

| username: leoones | Original post link

The first query has no results, it should be an issue with using order by and limit together.

The second and third queries have the same results.

| username: leoones | Original post link

Executing the subquery in the first SQL statement alone will retrieve the data with ID 271535. :sweat: