You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Maryann Xue (JIRA)" <ji...@apache.org> on 2015/07/28 00:20:05 UTC
[jira] [Comment Edited] (CALCITE-818) Multiple collation traits get wiped out when creating subset, thus cause unnecessary sort

    [ https://issues.apache.org/jira/browse/CALCITE-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643488#comment-14643488 ] 

Maryann Xue edited comment on CALCITE-818 at 7/27/15 10:19 PM:
---------------------------------------------------------------

The optimizer asks for the same trait set (except convention) of the output rel (the final plan) as that of the input rel (the rel created by SqlToRelConverter). Here the input rel is LogicalProject(input=LogicalValues(...), projects=...), and for the first case "select p0+p1 ..." the LogicalProject has collation trait "[0]", inferred from its child, so the output rel also has to have that collation while the collation trait of the child of the XXXProject is lost during subset creation, and that's why there is an extra EnumerableSort.
CALCITE-793 fixes the above situation for unordered queries (queries that don't have an order-by) in that it clears the collation trait of the input rel for such queries, but it does not fix the real problem I think, like these queries plus an order-by (select p0+p1 from ... order by p0+p1), or say if we are doing a merge-join and one of the tables has multiple collation traits, we might see a sort that could have been avoided.


was (Author: maryannxue):
The optimizer asks for the same trait set (except convention) of the output rel (the final plan) as that of the input rel (the rel created by SqlToRelConverter). Here the input rel is LogicalProject(input=LogicalValues(...), projects=...), and for the first case "select p0+p1 ..." the LogicalProject has collation trait "[0]", inferred from its child, so the output rel also has to have that collation while the collation trait of the child of the XXXProject is lost during subset creation, and that's why there is an extra EnumerableSort.
CALCITE-793 fixes this in that it clears the collation trait of the input rel if there isn't an order-by in the sql, but it does not fix the real problem I think, say if we are doing a merge-join and one of the tables has multiple collation traits, we might see a sort that could have been avoided.

> Multiple collation traits get wiped out when creating subset, thus cause unnecessary sort
> -----------------------------------------------------------------------------------------
>
>                 Key: CALCITE-818
>                 URL: https://issues.apache.org/jira/browse/CALCITE-818
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.4.0-incubating
>            Reporter: Maryann Xue
>            Assignee: Julian Hyde
>            Priority: Minor
>
> "select p1 from (values (2, 1)) as t(p0, p1)"
> or
> "select p0+p1 from (values (2, 1)) as t(p0, p1)"
> would return a plan (with VolcanoPlanner) like:
> {code}
> EnumerableSort(...)
>   EnumerableCalc(...)
>     EnumerableValues(...)
> {code}
> It was because a multiple collation trait was inferred from the LogicalValues rel as: [[0,1], [1]], and the LogicalProject would have a corresponding collation trait based on the project expressions. But when optimizing, the multiple collation trait was simplified to empty when a subset for the LogicalValues rel was created, thus making EnumerableCalc unable to infer collation accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)