You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/06/20 18:02:00 UTC

[jira] [Updated] (CALCITE-5775) Null direction emulation broken for complex expressions on some dialects

     [ https://issues.apache.org/jira/browse/CALCITE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated CALCITE-5775:
------------------------------------
    Labels: pull-request-available  (was: )

> Null direction emulation broken for complex expressions on some dialects
> ------------------------------------------------------------------------
>
>                 Key: CALCITE-5775
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5775
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Will Noble
>            Priority: Minor
>              Labels: pull-request-available
>
> This is a problem in BigQuery, and may be a problem in other dialects as well. Consider the following piece of valid BQ SQL:
> {code:sql}
> SELECT REPEAT(first_name, 2),
>        COUNT(id)
> FROM looker_test.users
> GROUP BY REPEAT(first_name, 2)
> ORDER BY 1
> {code}
> Now consider a version where the {{ORDER BY}} clause is changed to this:
> {code:sql}
> ORDER BY REPEAT(first_name, 2)
> {code}
> This is logically the same query, because the expression in the {{ORDER}} clause is the same as the one in the {{SELECT}} / {{GROUP}} clauses. BigQuery is sophisticated enough to match the select to the group expression in both queries, but cannot match either with the order expression. It gives this error: _ORDER BY clause expression references column first_name which is neither grouped nor aggregated_.
> So, when sorting by complex expressions in BQ, Calcite relies on either:
> * No null direction emulation required. 
> * sorting by alias or ordinal, which is a problem with current null direction emulation because it adds an extra complex sort expression, or
> * having a query that just happens to also have the underlying field in the {{GROUP BY}} clause by itself, which seems to actually happen pretty often in my testing, but obviously shouldn't be a constraint.
> As I wrote that, I realized this may be easily fixable for BQ since it added support for {{NULLS FIRST}} / {{LAST}} in 2020 and it seems Calcite has not caught up yet. Consider this rel node:
> {code}
> LogicalSort(sort0=[$0], dir0=[ASC])
>   LogicalAggregate(group=[{0}], cent=[COUNT($1)])
>     LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3])
>       JdbcTableScan(table=[[JDBC_SCOTT, EMP]])
> {code}
> Calcite would convert it to this in BigQuery due to null direction emulation:
> {code:sql}
> SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS cent
> FROM SCOTT.EMP
> GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END
> ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1
> {code}
> Which of course triggers the problem described above. This may be a problem for MSSQL as well since it doesn't support {{NULLS LAST}}. The fix for BQ, at least, may be to just support {{NULLS LAST}} and sort by ordinal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)