You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Will Noble (Jira)" <ji...@apache.org> on 2023/06/08 22:43:00 UTC

[jira] [Comment Edited] (CALCITE-5767) Calcite's MSSQL dialect should not give GROUPING special treatment when emulating NULL direction

    [ https://issues.apache.org/jira/browse/CALCITE-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730760#comment-17730760 ] 

Will Noble edited comment on CALCITE-5767 at 6/8/23 10:42 PM:
--------------------------------------------------------------

I updated the description and summary, but kept the verbose prose description because it's really a complicated problem with seemingly no ideal solution. Really, there are 4 potential "problems" that could be blamed for this. The first 3 are MSSQL's seemingly non-standard design decisions outlined in the ticket description (each of these is not necessarily something that needs changing, but if any one of these were different, there would be no problem). The fourth potential target for blame is the fact that {{RelBuilder}} uses a "default" null direction by default instead of {{NullDirection.UNSPECIFIED}} which, if used, would also eliminate this weird issue, and allow us to return {{null}} from {{emulateNullDirection}} and not have to include an effectively constant expression in the {{ORDER BY}} list. So, in a sense the problem lies with {{RelBuilder}} because it's forcing us to produce a seemingly less efficient and slightly more confusing SQL query.

But, changing the default null direction in {{RelBuilder}} from NULLS-high to unspecified would be a major breaking change, so I'm assuming that's a non-starter.


was (Author: wnoble):
I updated the description and summary, but kept the verbose prose description because it's really a complicated problem with seemingly no ideal solution. Really, there are 4 potential "problems" that could be blamed for this. The first 3 are MSSQL's seemingly non-standard design decisions outlined in the ticket description (each of these is not necessarily something that needs changing, but if any one of these were different, there would be no problem). The fourth potential target for blame is the fact that {{RelBuilder}} uses a "default" null direction by default instead of {{NullDirection.UNSPECIFIED}} which, if used, would also eliminate this weird issue, and allow us to return {{null}} from {{emulateNullDirection}} and not have to include an effectively constant expression in the {{ORDER BY}} list.

But, changing the default null direction in {{RelBuilder}} from NULLS-high to unspecified would be a major breaking change, so I'm assuming that's a non-starter.

> Calcite's MSSQL dialect should not give GROUPING special treatment when emulating NULL direction
> ------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5767
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5767
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Will Noble
>            Assignee: Will Noble
>            Priority: Minor
>              Labels: pull-request-available
>
> {{MssqlSqlDialect.emulateNullDirection}} has [special logic for {{GROUPING}} calls|https://github.com/apache/calcite/blob/814ae6ec09e72544ba010f2591e06020c55b162b/core/src/main/java/org/apache/calcite/sql/dialect/MssqlSqlDialect.java#L95]. This seems to be an optimization attempt since {{GROUPING}} is known to never return {{{}NULL{}}}, and therefore needs no null direction emulation. However, this causes problems because:
> 1. MSSQL does not have the same default null direction as Calcite's {{RelBuilder}} (and most other dialects).
> 2. MSSQL does not support the common {{NULLS FIRST}} or {{NULLS LAST}} syntax.
> 3. MSSQL does not allow sorting on the same field twice, even though there is no theoretical issue with this.
> Each of these properties must be present for the problem to occur, so it's a bit niche and specific to MSSQL. Seems like the best solution is to simply eliminate the special-case treatment for {{GROUPING}} in {{{}emulateNullDirection{}}}, which is currently creating problems due to property #3.
> {*}More in-depth explanation{*}:
> In {{{}RelBuilder.collation{}}}, we use the "default null direction" to insert rex nodes as sorting expressions, but this is only the default null direction for NULLS-high dialects, i.e. *not* MSSQL. This is a problem because MSSQL has special-case logic for emulating null direction of GROUPING calls, whereby it effectively duplicates the expression. Really, {{MssqlSqlDialect.emulateNullDirection}} probably should've been returning {{null}} instead, signalling to callers that no null-direction emulation is necessary because {{GROUPING}} never returns {{{}NULL{}}}, but this causes another problem due to property #2 above when the null direction is non-default as is caused simply by using {{RelBuilder.collation}} as described above (it should be noted that this method takes rex nodes instead of {{RelFieldCollation}} object, so there is no way to specify null direction) because the non-default null direction is not expanded into a {{CASE}} expression (MSSQL does not support {{NULLS FIRST}} or {{LAST}} syntax).
> Here's a test illustrating the problem:
> Input SQL (default dialect)
> {code:xml}
> select "product_class_id", "brand_name", GROUPING("brand_name")
> from "product"
> group by GROUPING SETS (("product_class_id", "brand_name"), ("product_class_id"))
> order by 3, 2, 1
> {code}
> Current behavior for unparsing as MSSQL (incorrect because it orders by the same column twice; {{GROUPING([brand_name])}} and {{{}3{}}}, which will fail if you try to actually run this against a real MSSQL database, even though it seems like it shouldn't):
> {code:xml}
> SELECT [product_class_id], [brand_name], GROUPING([brand_name])
> FROM [foodmart].[product]
> GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
> ORDER BY
>          GROUPING([brand_name]),
>          3,
>          CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>          [brand_name],
>          CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>          [product_class_id]
> {code}
> Behavior where {{MssqlSqlDialect.emulateNullDirection}} simply returns {{null}} for {{GROUPING}} expressions (incorrect because it uses {{NULLS LAST}} syntax):
> {code:xml}
> ...
> ORDER BY
>          3 NULLS LAST,
>          CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>          [brand_name],
>          CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>          [product_class_id]
> {code}
> Acceptable behavior (although the first {{{}ORDER BY{}}}-clause is effectively ordering by a constant, this will at least run and produce the correct results):
> {code:xml}
> ...
> ORDER BY
>          CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
>          3,
>          CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
>          [brand_name],
>          CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
>          [product_class_id]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)