Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.
Original topic: union all返回结果顺序不确定
For example:

In other common databases, the result set of union all connecting multiple select statements is arranged in the order in which the select statements appear, but in TiDB it is unordered. This should not be related to the characteristics of distributed databases. It is recommended to improve this.
You can add an order by clause. Without an order by clause, there is no guarantee of sorting. The database returning results in this way is not an issue; different database implementations may exhibit different behaviors.
There is no need for sorting here, just arrange the result set according to the select order.
There is no sorting in the SQL semantics.
I have also encountered this, but it does not affect usage.
No, the semantics of UNION ALL
is to output the result set of SELECT a
and SELECT b
together. Most databases follow the order of a
first and then b
, which is related to the order of the SELECT
statements. For example, MySQL 8:
or Oracle:
The result is predictable.
This is normal because data needs to be aggregated from multiple nodes, and there may be discrepancies in the order of the returned data. You need to establish your own sorting rules.
This kind of SQL should not involve data aggregation across multiple nodes.
The issue of order is specifically mentioned in https://dev.mysql.com/doc/refman/5.7/en/union.html#union-order-by-limit
It seems to have been removed in https://dev.mysql.com/doc/refman/8.0/en/union.html, possibly because MySQL 8.0 has changed its logic.
Currently, TiDB is still adapting to MySQL 8.0, so we can check the situation after the release of version 7.5.
There is no multi-node aggregation; all aggregation is done by the same TiDB server.
The logic of union and union all is different. Union requires deduplication, meaning the data needs to be aggregated and hashed or sorted, while union all does not.
This is a characteristic of distributed multi-node data retrieval.
It’s not a distributed characteristic because it doesn’t involve sorting. Just like in my example, it doesn’t involve fetching values from TiKV.
Yes, that’s how it was originally.
Look at my examples of 4 1 3 4 1 3, they are not sorted.
In SQL semantics, only the use of the ORDER BY clause guarantees the ordered output of result sets. In single-node databases, since data is stored on a single server, the results of multiple executions are often stable without data reorganization. Some databases (especially the MySQL InnoDB storage engine) will output result sets in the order of primary keys or indexes. TiDB is a distributed database, where data is stored on multiple servers, and the TiDB layer does not cache data pages. Therefore, the order of result sets for SQL statements without ORDER BY is easily perceived as unstable. To output result sets in order, you need to explicitly add the fields to be sorted to the ORDER BY clause, which conforms to SQL semantics.
Unstable Result Set | PingCAP Documentation Center
It has nothing to do with order by, and it does not involve sorting the result set here.