Regarding the Issue of Unstable Query Result Sets After Sorting

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: 关于排序后查询结果集不稳定的问题

| username: hey-hoho

To improve efficiency, please provide the following information. Clear problem descriptions can be resolved faster:
[TiDB Usage Environment]
Test environment

[Overview] Scenario + Problem Overview
The following table structure and test data are provided:

DROP TABLE IF EXISTS `test1`;
CREATE TABLE `test1`  (
  `a` int(11) NULL DEFAULT NULL,
  `b` int(11) NULL DEFAULT NULL,
  `c` int(6) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin;
INSERT INTO `test1` VALUES (1, 1, 0);
INSERT INTO `test1` VALUES (2, 2, NULL);
INSERT INTO `test1` VALUES (3, 3, 0);
INSERT INTO `test1` VALUES (4, 4, 0);
INSERT INTO `test1` VALUES (5, 5, 0);
INSERT INTO `test1` VALUES (6, 6, 0);

DROP TABLE IF EXISTS `test2`;
CREATE TABLE `test2`  (
  `a` bigint(20) NULL DEFAULT NULL,
  `b` int(11) NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_bin;
INSERT INTO `test2` VALUES (34, 3);
INSERT INTO `test2` VALUES (53, 5);
INSERT INTO `test2` VALUES (38, 39);
INSERT INTO `test2` VALUES (22, 2);
INSERT INTO `test2` VALUES (87, 51);
INSERT INTO `test2` VALUES (45, 67);
INSERT INTO `test2` VALUES (88, 4);
INSERT INTO `test2` VALUES (567, 786);
INSERT INTO `test2` VALUES (67, 678);
INSERT INTO `test2` VALUES (234563, 65);
INSERT INTO `test2` VALUES (546, 8);

Query SQL (result order is unstable):

SELECT * FROM (
select test1.*
 ,(select max(test2.a) from test2 where test2.b=test1.a) as p
from test1 
) t
order by c

Sorting query based on the c field of test1 (with duplicate and null values) and including a subquery from test2, the result set order is unstable.

However, if the subquery is removed, the result set order is stable. Further investigation found that removing the max function in the subquery also stabilizes the order.

The above script can reproduce this issue.

[Problem]

  1. Why does using a function in the subquery cause the result set to be unstable?
  2. For single table queries, if the sorting field has duplicate values, what rule determines the final output order?

[Business Impact]
Unstable query results

[TiDB Version]
5.2.2


If the question is about performance optimization or fault troubleshooting, please download the script and run it. Please select all and copy-paste the terminal output results for upload.

| username: ddhe9527 | Original post link

Tested under v5.4.0, the sorting is stable.

| username: Hacker007 | Original post link

I haven’t noticed, the query results don’t change each time.

| username: Hacker007 | Original post link

However, the order of results is inconsistent across different versions.

| username: hey-hoho | Original post link

Execute it a few more times. I encountered a situation where the order was different after executing it more than 20 times in version 5.4:

| username: hey-hoho | Original post link

Currently tested versions 522, 531, 541, and 600, none of which can guarantee a fixed order.

| username: Hacker007 | Original post link

Yes, there are changes, but the result is also correct. It is different from MySQL, and it can also be resolved by adding a sorting field.

| username: cs58_dba | Original post link

Is it enough to upgrade to the latest version of 5.4?

| username: hey-hoho | Original post link

5.4 doesn’t work either, tested 6.0 and it doesn’t work either.

| username: 小王同学Plus | Original post link

I tested version 6.1 and encountered some issues. I’ll confirm them first and then reply.

| username: TiDBer_wTKU9jv6 | Original post link

Except for null, column c is all 0. The sorting of identical values in order by is inherently uncertain, and this situation can also occur in MySQL. Just add another field after order by.

| username: TiDBer_wTKU9jv6 | Original post link

When encountering the same order by value, the sorting result is unstable. To reduce randomness, the order by value should be kept as unique as possible. If uniqueness cannot be guaranteed, continue adding fields until the combination of order by fields is unique, only then will the result be unique.

| username: 西伯利亚狼 | Original post link

:+1:

| username: tiancaiamao | Original post link

  1. Why does using a function in a subquery lead to an unstable result set?

This question is incorrect. It’s not the use of a function in the subquery that causes the result set to be unstable, but rather the query itself does not guarantee stability. The only guarantee this SQL provides is that the result is ordered by column C, and the content of the result set is consistent each time. Everything else is undefined behavior and depends on implementation details. When the subquery does not use the max function, it uses an Apply operator, which is a less efficient implementation that happens to give you an ordered result. However, when using max, the Apply operator is optimized into a HashJoin, which is a more efficient concurrent implementation. The side effect of efficient concurrency is that the returned result becomes unstable.

  1. For single-table queries, if there are duplicate values in the sorting field, what rules determine the final output order?

Undefined behavior. If you expect to get an ordered result when there are duplicate values in the sorting field, you should use multiple columns to sort. This way, the SQL written will be universally applicable across different implementations.

| username: ShawnYan | Original post link

Got it.


| username: hey-hoho | Original post link

This topic was automatically closed 60 days after the last reply. No new replies are allowed.