The order of results returned by UNION ALL is not guaranteed

Note:
This topic has been translated from a Chinese forum by GPT and might contain errors.

Original topic: union all返回结果顺序不确定

| username: zhanggame1

For example:
image
In other common databases, the result set of union all connecting multiple select statements is arranged in the order in which the select statements appear, but in TiDB it is unordered. This should not be related to the characteristics of distributed databases. It is recommended to improve this.

| username: 小龙虾爱大龙虾 | Original post link

You can add an order by clause. Without an order by clause, there is no guarantee of sorting. The database returning results in this way is not an issue; different database implementations may exhibit different behaviors.

| username: zhanggame1 | Original post link

There is no need for sorting here, just arrange the result set according to the select order.

| username: 小龙虾爱大龙虾 | Original post link

There is no sorting in the SQL semantics.

| username: 像风一样的男子 | Original post link

I have also encountered this, but it does not affect usage.

| username: zhanggame1 | Original post link

No, the semantics of UNION ALL is to output the result set of SELECT a and SELECT b together. Most databases follow the order of a first and then b, which is related to the order of the SELECT statements. For example, MySQL 8:


or Oracle:
image

The result is predictable.

| username: 随缘天空 | Original post link

This is normal because data needs to be aggregated from multiple nodes, and there may be discrepancies in the order of the returned data. You need to establish your own sorting rules.

| username: forever | Original post link

This kind of SQL should not involve data aggregation across multiple nodes.

| username: yiduoyunQ | Original post link

The issue of order is specifically mentioned in https://dev.mysql.com/doc/refman/5.7/en/union.html#union-order-by-limit

It seems to have been removed in https://dev.mysql.com/doc/refman/8.0/en/union.html, possibly because MySQL 8.0 has changed its logic.

Currently, TiDB is still adapting to MySQL 8.0, so we can check the situation after the release of version 7.5.

| username: zhanggame1 | Original post link

There is no multi-node aggregation; all aggregation is done by the same TiDB server.

| username: zhanggame1 | Original post link

The logic of union and union all is different. Union requires deduplication, meaning the data needs to be aggregated and hashed or sorted, while union all does not.

| username: dba远航 | Original post link

This is a characteristic of distributed multi-node data retrieval.

| username: zhanggame1 | Original post link

It’s not a distributed characteristic because it doesn’t involve sorting. Just like in my example, it doesn’t involve fetching values from TiKV.

| username: ShawnYan | Original post link

Yes, that’s how it was originally.

| username: zhanggame1 | Original post link

Look at my examples of 4 1 3 4 1 3, they are not sorted.

| username: 春风十里 | Original post link

In SQL semantics, only the use of the ORDER BY clause guarantees the ordered output of result sets. In single-node databases, since data is stored on a single server, the results of multiple executions are often stable without data reorganization. Some databases (especially the MySQL InnoDB storage engine) will output result sets in the order of primary keys or indexes. TiDB is a distributed database, where data is stored on multiple servers, and the TiDB layer does not cache data pages. Therefore, the order of result sets for SQL statements without ORDER BY is easily perceived as unstable. To output result sets in order, you need to explicitly add the fields to be sorted to the ORDER BY clause, which conforms to SQL semantics.

Unstable Result Set | PingCAP Documentation Center

| username: zhanggame1 | Original post link

It has nothing to do with order by, and it does not involve sorting the result set here.