You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2023/06/27 01:45:00 UTC

[jira] [Commented] (CALCITE-5801) Document that Aggregate grouping keys are emitted from lowest to highest index

    [ https://issues.apache.org/jira/browse/CALCITE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737410#comment-17737410 ] 

Julian Hyde commented on CALCITE-5801:
--------------------------------------

I agree that this is a rather subtle aspect of Calcite's behavior and would benefit from better documentation.

(It is a necessary for Calcite to work this way. "\{{GROUP BY a, b}}", "\{{GROUP BY b, a}}" and "\{{GROUP BY a, b, a}}" all have identical effect in SQL. Calcite's representation of these should be, and is, identical.)

For a while I have been working on CALCITE-5802, a feature request to make it easier to translate complex aggregate expressions such as "1 + SUM(x + 2)". It is somewhat related to your problem, in that it deals with expressions that are too complex to express in a single relational operation, and in particular deals with the {{SELECT}} and {{GROUP BY}} clauses simultaneously.

> Document that Aggregate grouping keys are emitted from lowest to highest index
> ------------------------------------------------------------------------------
>
>                 Key: CALCITE-5801
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5801
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.34.0
>            Reporter: Chris Rice
>            Priority: Minor
>
> Consider query:
> {code}
> SELECT b, a FROM t GROUP BY b, a
> {code}
> where the schema for table {{t}} emits {{a}} in column 0 and {{b}} in column 1.
> You might be tempted to build this relation using:
> {code}
> relBuilder.scan("t");
> var groupKey = relBuilder.groupKey(1, 0);
> var aggregate = relBuilder.aggregate(groupKey).build();
> {code}
> but this will emit a plan which emits `a, b` rather than `b, a`:
> {code}
> LogicalAggregate(group=[{0, 1}])
>   LogicalTableScan(table=[[t]])
> {code}
> The reason being that {{Aggregate}} represents its {{groupSet}} as a {{BitSet}} of columns which are implicitly always ordered from lowest to highest _index_.  Or, in other words, the line {{relBuilder.groupKey(1, 0)}} always creates the same grouping key as {{relBuilder.groupKey(0, 1)}}.
> It should be documented somewhere that Aggregates expect their incoming grouping columns to be sorted in order of lowest to highest index in the incoming schema.  Possible candidates for places for this documentation include:
> * On {{RelBuilder#groupKey()}}
> * On {{RelBuilder#aggregate()}}
> * On {{Aggregate}} itself
> This is not just a hypothetical problem - the substrait-java project in fact does not properly handle converting aggregates whose grouping key order is not already sorted from lowest to highest index - See [SubstraitRelNodeConverter#visit(Aggregate)|https://github.com/substrait-io/substrait-java/blob/main/isthmus/src/main/java/io/substrait/isthmus/SubstraitRelNodeConverter.java#L208].  Of course, this is a separate bug with substrait-java, but perhaps it could have been avoided (and future such problems could still be avoided) with slightly more documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)