Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: tidb sql查询数据不正确,条件不过滤。
[Overview] Scenario + Problem Summary
There is an issue with querying data in TiDB, as shown by the comparison of the following two SQL statements:
The first SQL statement can retrieve records where the id is 271535 and next_check_time is greater than 1664360160.
The correct query result is as follows:
The table structure is as follows:
CREATE TABLE temp_order_job
(
id
bigint(20) NOT NULL /*T![auto_rand] AUTO_RANDOM(4) */,
seller_id
varchar(100) NOT NULL DEFAULT ‘’,
region
varchar(20) NOT NULL DEFAULT ‘’ COMMENT ‘’,
marketplace_ids
varchar(500) NOT NULL DEFAULT ‘’ COMMENT ‘’,
start_timestamp
int(10) NOT NULL DEFAULT ‘0’ COMMENT ‘’,
end_timestamp
int(10) NOT NULL DEFAULT ‘0’ COMMENT ‘’,
request_status
int(10) NOT NULL DEFAULT ‘0’ COMMENT ‘’,
api_type
varchar(20) NOT NULL DEFAULT ‘sp’ COMMENT ‘’,
gmt_modified
datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘’,
gmt_create
datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘’,
next_check_time
int(10) NOT NULL COMMENT ‘’,
PRIMARY KEY (id
) /*T![clustered_index] CLUSTERED */,
KEY idx-request_status-next_check_time
(request_status
,next_check_time
)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T![auto_rand_base] AUTO_RANDOM_BASE=1260001 */
[Business Impact]
Incorrect data query
[TiDB Version]
V5.4.2
It can’t be such a coincidence, have you done a BR restore?
You can test the first sentence, and also try adding the ID condition in the subquery.
Are both of these query results reproducible? Could it be that the data was modified before and after the execution of the two SQL statements? This next_check_time column is a timestamp, so it should be meaningful, right? Could it be that the status of this row was modified at a certain time?
Seeing the name with “temp,” it can’t really be a temporary table that changes frequently, right?
The update is not that fast. I repeatedly checked the phenomenon yesterday and today, and it is the same.
Is this phenomenon still present if you add the ID condition in parentheses?
Have you ever performed a BR restore?
When this situation occurred before, excluding bugs and SQL writing issues, there were mainly two reasons:
- Restoring data to a non-empty database, causing data retrieval confusion.
- Other threads frequently updating table data.
In the execution result of this SQL, can you see the value of request_status for the row where id=271535? Is it 1 or 2? Should we confirm which query condition is causing the request_status value to be different in the results of the two SQL queries above?
The one above has “order by id” while the one below does not have “order by”.
The result values are as follows
The correct values are as follows
I don’t quite understand. The data below already has id=271535, so why do we still need to order by? We need to verify why the values of the two SQL queries are different. Is this a bug or a usage issue?
How about checking like this?
SELECT id, next_check_time, request_status
FROM temp_order_job
WHERE request_status = 1
AND next_check_time < 1664360160
AND id = 271535;
There shouldn’t be two databases with this table in TiDB, right? I noticed that the two SQL statements above didn’t specify the database name for the table…
In March, I used BR to migrate to this database. At that time, the target was a new cluster with no business data.
I specified the database for the query, and there is only this one table in the cluster…
You can’t find it because the correct value for id 271535 is request_status = 2.
The image is not visible. Please provide the text you need translated.
Take a look at the query results, which one has a problem?
The results are as follows:
Does this return any results?
If not, it looks like an issue with using limit and order by together…
select * from (select * from (select id, next_check_time, request_status From temp_order_job WHERE request_status = 1 AND next_check_time < 1664360160 order by id) t limit 50) tt where tt.id = 271535;
– You can consider this, should we execute the following SQL to see? Adjust the limit to query all data…
select * from (select * from (select id, next_check_time, request_status From temp_order_job WHERE request_status = 1 AND next_check_time < 1664360160 order by id) t limit 37000) tt where tt.id = 271535;
The first query has no results, it should be an issue with using order by and limit together.
The second and third queries have the same results.
Executing the subquery in the first SQL statement alone will retrieve the data with ID 271535.