You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Haisheng Yuan (JIRA)" <ji...@apache.org> on 2019/05/22 19:54:00 UTC

[jira] [Commented] (CALCITE-2648) Output collation of EnumerableWindow is not consistent with its implementation

    [ https://issues.apache.org/jira/browse/CALCITE-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846193#comment-16846193 ] 

Haisheng Yuan commented on CALCITE-2648:
----------------------------------------

Whether EnumerableWindow should preserve the order is highly dependent on the runtime implementation.  In Postgres/GPDB, the window is sort based, so the optimizer assume window operator preserves the order. In calcite, I suppose it uses hashmap for window partitioning?
{quote}
We should not necessarily preserve order, if doing so would be expensive (and/or more complicated). 
{quote}
I don't agree with this. If we don't preserve order, we will lose a lot of optimization opportunities. e.g
select row_number() over (partition by a order by b,c), row_number() over (partition by a order by b) from foo;
We just need 1 sort, but in calcite plan, it does the window separately, which is a waste.

> Output collation of EnumerableWindow is not consistent with its implementation
> ------------------------------------------------------------------------------
>
>                 Key: CALCITE-2648
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2648
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.17.0
>            Reporter: Hongze Zhang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: postgresql_96_doesnt_care_to_keep_collation_for_project_over_expression.png
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here is a case:
> {code:sql}
> select x, COUNT(*) OVER (PARTITION BY x) from (values (20), (35)) as t(x) ORDER BY x
> {code}
> Final plan:
> {code:java}
> EnumerableWindow(window#0=[window(partition {0} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT()])])
>   EnumerableValues(tuples=[[{ 20 }, { 35 }]])
> {code}
> Output rows:
> {code:java}
> X  |EXPR$1 |
> ---|-------|
> 35 |1      |
> 20 |1      |
> {code}
> EnumerableWindow is supposed to preserve input collations, as a result EnumerableSort is ignored. However the implementation of EnumerableWindow generates non-ordered output (when PARTITION BY clause is used).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)