You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Will Noble (Jira)" <ji...@apache.org> on 2023/06/28 22:58:00 UTC

[jira] [Updated] (CALCITE-5808) Rel-to-Sql conversion should better support grouping or sorting by references or ordinals

     [ https://issues.apache.org/jira/browse/CALCITE-5808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Noble updated CALCITE-5808:
--------------------------------
    Description: 
While exploring solutions to CALCITE-5724 I produced [this draft commit|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2] to explore what the Rel-to-Sql conversion process would look like if it were heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} clauses. It works for most common queries, but I gave up with basically 2 problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow ordinals (at least [according to this presto ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check any standards or dialects), so my draft solution would have to distinguish between these cases and a simple group-by, and only use ordinals in the simple case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. It seems likely that ordinals are also disallowed in this context, but I'm not sure. Looking at the results in {{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the query is incorrect even if ordinals can hypothetically go in a window's {{ORDER BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals whenever the dialect allows *and* the expression returned by {{field()}} is anything other than a {{SqlIdentifier}} (i.e. a named column reference). Existing behavior as of writing is to only use ordinals when the expression is a {{SqlCall}}, but we should use it for literals as well. That would solve CALCITE-5724, but still use named references for "simple" collations, which seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can get easily confused by this, even when the expression in the {{SELECT}} list perfectly matches that in the {{GROUP BY}} clause. With our Calcite integration, we need customized cleanup logic to rewrite the {{GROUP BY}} clauses in terms of aliases / ordinals in the {{SELECT}} list whenever possible, as a band-aid for this problem. We want to get rid of this band-aid and upstream a proper solution, and I thought using ordinals could be it, but the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite seems to have poor support for grouping by ordinals in general (see the changes I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which should be improved.

  was:
While exploring solutions to CALCITE-5724 I produced [this draft PR|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2] to explore what the Rel-to-Sql conversion process would look like if it were heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} clauses. It works for most common queries, but I gave up with basically 2 problems yet to be solved:

# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow ordinals (at least [according to this presto ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check any standards or dialects), so my draft solution would have to distinguish between these cases and a simple group-by, and only use ordinals in the simple case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft. It seems likely that ordinals are also disallowed in this context, but I'm not sure. Looking at the results in {{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the query is incorrect even if ordinals can hypothetically go in a window's {{ORDER BY}}.

So, I've decided to stop working on this for now, but wanted to preserve my draft change and start a discussion on this ticket. Here are some thoughts:

# It seems like we can reach a happy middle ground on sorting by using ordinals whenever the dialect allows *and* the expression returned by {{field()}} is anything other than a {{SqlIdentifier}} (i.e. a named column reference). Existing behavior as of writing is to only use ordinals when the expression is a {{SqlCall}}, but we should use it for literals as well. That would solve CALCITE-5724, but still use named references for "simple" collations, which seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing, Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can get easily confused by this, even when the expression in the {{SELECT}} list perfectly matches that in the {{GROUP BY}} clause. With our Calcite integration, we need customized cleanup logic to rewrite the {{GROUP BY}} clauses in terms of aliases / ordinals in the {{SELECT}} list whenever possible, as a band-aid for this problem. We want to get rid of this band-aid and upstream a proper solution, and I thought using ordinals could be it, but the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite seems to have poor support for grouping by ordinals in general (see the changes I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which should be improved.


> Rel-to-Sql conversion should better support grouping or sorting by references or ordinals
> -----------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5808
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5808
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Will Noble
>            Priority: Major
>
> While exploring solutions to CALCITE-5724 I produced [this draft commit|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2] to explore what the Rel-to-Sql conversion process would look like if it were heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}} clauses. It works for most common queries, but I gave up with basically 2 problems yet to be solved:
> # Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow ordinals (at least [according to this presto ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check any standards or dialects), so my draft solution would have to distinguish between these cases and a simple group-by, and only use ordinals in the simple case.
> # Window functions with an {{ORDER BY}} clause also use ordinals in my draft. It seems likely that ordinals are also disallowed in this context, but I'm not sure. Looking at the results in {{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the query is incorrect even if ordinals can hypothetically go in a window's {{ORDER BY}}.
> So, I've decided to stop working on this for now, but wanted to preserve my draft change and start a discussion on this ticket. Here are some thoughts:
> # It seems like we can reach a happy middle ground on sorting by using ordinals whenever the dialect allows *and* the expression returned by {{field()}} is anything other than a {{SqlIdentifier}} (i.e. a named column reference). Existing behavior as of writing is to only use ordinals when the expression is a {{SqlCall}}, but we should use it for literals as well. That would solve CALCITE-5724, but still use named references for "simple" collations, which seems to be the case for all window functions.
> # When it comes to grouping, things are more complicated. As of writing, Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can get easily confused by this, even when the expression in the {{SELECT}} list perfectly matches that in the {{GROUP BY}} clause. With our Calcite integration, we need customized cleanup logic to rewrite the {{GROUP BY}} clauses in terms of aliases / ordinals in the {{SELECT}} list whenever possible, as a band-aid for this problem. We want to get rid of this band-aid and upstream a proper solution, and I thought using ordinals could be it, but the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite seems to have poor support for grouping by ordinals in general (see the changes I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which should be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)