You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Daniel Becker (Jira)" <ji...@apache.org> on 2022/11/23 12:36:00 UTC

[jira] [Updated] (IMPALA-4741) ORDER BY behavior with UNION is incorrect

     [ https://issues.apache.org/jira/browse/IMPALA-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Becker updated IMPALA-4741:
----------------------------------
    Target Version: Impala 4.3.0  (was: Impala 4.2.0)

> ORDER BY behavior with UNION is incorrect
> -----------------------------------------
>
>                 Key: IMPALA-4741
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4741
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.8.0
>            Reporter: Greg Rahn
>            Priority: Critical
>              Labels: correctness, incompatibility, ramp-up, sql-language, tpc-ds
>         Attachments: query36a.sql, query49.sql
>
>
> When a query uses the UNION, EXCEPT, or INTERSECT operators, the ORDER BY clause must be specified at the end of the statement and the results of the combined queries are sorted.  ORDER BY clauses are not allowed in individual branches unless the branch is enclosed by parentheses.
> There are two bugs currently:
> # An ORDER BY is allowed in a branch of a UNION that is not enclosed in parentheses
> # The final ORDER BY of a UNION is attached to the nearest branch when it should be sorting the combined results of the UNION(s)
> For example, this is not valid syntax but is allowed in Impala
> {code}
> select * from t1 order by 1
> union all
> select * from t2
> {code}
> And for queries like this, the ORDER BY should order the unioned result, not just the nearest branch which is the current behavior.
> {code}
> select * from t1
> union all
> select * from t2
> order by 1
> {code}
> If one wants ordering within a branch, the query block must be enclosed by parentheses like such:
> {code}
> (select * from t1 order by 1)
> union all
> (select * from t2 order by 2)
> {code}
> Here is an example where incorrect results are returned.
> Impala
> {code}
> [impalad:21000] > select r_regionkey, r_name from region union all select r_regionkey, r_name from region order by 1 limit 2;
> +-------------+-------------+
> | r_regionkey | r_name      |
> +-------------+-------------+
> | 0           | AFRICA      |
> | 1           | AMERICA     |
> | 2           | ASIA        |
> | 3           | EUROPE      |
> | 4           | MIDDLE EAST |
> | 0           | AFRICA      |
> | 1           | AMERICA     |
> +-------------+-------------+
> Fetched 7 row(s) in 0.12s
> {code}
> PostgreSQL
> {code}
> tpch=# select r_regionkey, r_name from region union all select r_regionkey, r_name from region order by 1 limit 2;
>  r_regionkey |          r_name
> -------------+---------------------------
>            0 | AFRICA
>            0 | AFRICA
> (2 rows) 
> {code}
> see also https://cloud.google.com/spanner/docs/query-syntax#syntax_5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org