You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2018/03/16 23:03:01 UTC

[jira] [Comment Edited] (CALCITE-2216) Improve extensibility of AggregateReduceFunctionsRule

    [ https://issues.apache.org/jira/browse/CALCITE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403084#comment-16403084 ] 

Fabian Hueske edited comment on CALCITE-2216 at 3/16/18 11:02 PM:
------------------------------------------------------------------

These fields are not really hidden but just fields emitted from the window aggregation operator (as requested by a query ({{SELECT TUMBLE_START(...) ...)}}. All time fields are explicit in Flink's SQL implementation.

In some sense, these fields are grouping columns (window start and end can define a window), but since they do not exist in the input schema we cannot reference them as such because grouping columns are represented as bitset. We want to avoid adding a calc before the aggregate that conceptually evaluates the window function (TUMBLE/HOP/SESSION) and its helper functions (TUMBLE_START, etc.), since this would make the translation to Flink's operators more difficult. Moreover, it is not really possible to evaluate a SESSION function in a Calc and lead to a conceptual mismatch.

We could represent the fields as agg calls, but IMO that's a conceptual mismatch and work around as well.

I'm aware that Flink's approach is not very elegant, but all alternatives that I'm aware of have drawbacks as well.

 


was (Author: fhueske):
These fields are not really hidden but just fields emitted from the window aggregation operator. All time fields are explicit in Flink's SQL implementation.

In some sense, these fields are grouping columns (window start and end can define a window), but since they do not exist in the input schema we cannot reference them as such because grouping columns are represented as bitset. We want to avoid adding a calc before the aggregate that conceptually evaluates the window function (TUMBLE/HOP/SESSION) and its helper functions (TUMBLE_START, etc.), since this would make the translation to Flink's operators more difficult. Moreover, it is not really possible to evaluate a SESSION function in a Calc and lead to a conceptual mismatch.

We could represent the fields as agg calls, but IMO that's a conceptual mismatch and work around as well.

I'm aware that Flink's approach is not very elegant, but all alternatives that I'm aware of have drawbacks as well.

 

> Improve extensibility of AggregateReduceFunctionsRule
> -----------------------------------------------------
>
>                 Key: CALCITE-2216
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2216
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.15.0
>            Reporter: Fabian Hueske
>            Assignee: Julian Hyde
>            Priority: Minor
>
> I'm proposing to improve the extensibility of {{AggregateReduceFunctionsRule}}. The purpose of the rule is to decompose complex aggregation functions like {{VAR_POP}} and {{STDDEV_SAMP}} into {{COUNT}} and {{SUM}} functions and compute the original functions in a subsequent Calc operator.
> Right now, the rule class provides a {{protected}} method that can be overridden to create an {{Aggregate}} with the updated aggregate calls.
> We are using the rule in Flink and have a special {{Aggregate}} Rel for group-windowed aggregations ({{GROUP BY TUMBLE/HOP/SESSION}}). Our implementation requires to forward some additional fields from the {{Aggregate}} for window properties like {{TUMBLE_START}} or {{HOP_END}}. In the current form, we cannot extend the rule, because these fields are striped off by the {{Calc}} node that is automatically added by the rule.
> I'm proposing to also move the code to create the {{Calc}} into a {{protected}} method just like the code to create the new {{Aggregate}}.
> I know, this is a fairly Flink-specific issue, but the code changes are minimal (no change in functionality) and it would help us, because we would not need to copy the rule and maintain it in Flink.
> I'll open a PR for this. Looking forward to your comments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)