You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2022/03/18 19:37:00 UTC
[jira] [Commented] (CALCITE-5051) UNION query plan prevents projection push down

    [ https://issues.apache.org/jira/browse/CALCITE-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509048#comment-17509048 ] 

Julian Hyde commented on CALCITE-5051:
--------------------------------------

There was a bug logged not too long ago that said that we should be able to push projects through UNION. But IIRC that wasn't safe, because a narrower projection caused more rows to become duplicates. Can you find that bug and make sure that it doesn't apply here.

Can you explain how you are able to convert "EnumerableUnion(a=[true])" to "EnumerableUnion(a=[false])"?

Can you identity which commit (or JIRA case) caused the issue you are seeing?

I am supportive of this change. I just want to make sure we have done our due diligence.

> UNION query plan prevents projection push down
> ----------------------------------------------
>
>                 Key: CALCITE-5051
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5051
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.29.0
>            Reporter: Zachary Gramana
>            Priority: Major
>
> As a user with a custom Calcite adapter that does push down, I should be able to run a UNION query of statements containing joins and still get the benefit of projection push down.
> Given a query such as:
> {code:sql}
> SELECT Id
>   FROM MySchema.t1
> UNION
> SELECT t3.Id
>   FROM MySchema.t2
>   JOIN MySchema.t3 ON (t3.Id = t2.t3_Id)
> {code}
> I expect a resulting query plan that looks like:
> {code:lua}
> EnumerableUnion(all=[true])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t1]])
>   EnumerableCalc(expr#0..1=[{inputs}], Id=[$t1])
>     EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
>       EnumerableSort(sort0=[$0], dir0=[ASC])
>         EnumerableCalc(expr#0..100=[{inputs}], expr#101=[CAST($t1):BIGINT NOT NULL], t3_Id0=[$t101])
>           MyEnumerableConverter
>             MyTableScan(table=[[MySchema, t2]])
>       EnumerableSort(sort0=[$0], dir0=[ASC])
>         MyEnumerableConverter
>           MyProject(Id=[$0])
>             MyTableScan(table=[[MySchema, t3]])
> {code}
> But instead I observed:
> {code:java}
> EnumerableUnion(all=[false])
>   MyEnumerableConverter
>     MyProject(Id=[$0])
>       MyTableScan(table=[[MySchema, t1]])
>   EnumerableCalc(expr#0..251=[{inputs}], Id=[$t102])
>     EnumerableMergeJoin(condition=[=($101, $102)], joinType=[inner])
>       EnumerableSort(sort0=[$101], dir0=[ASC])
>         EnumerableCalc(expr#0..100=[{inputs}], expr#101=[CAST($t1):BIGINT NOT NULL], proj#0..101=[{exprs}])
>           MyEnumerableConverter
>             MyTableScan(table=[[MySchema, t2]])
>       EnumerableSort(sort0=[$0], dir0=[ASC])
>         MyEnumerableConverter
>           MyTableScan(table=[[MySchema, t3]])
> {code}
> Note that:
>  # The {{EnumerableCalc}} node applied to the {{EnumerableMergeJoin}} goes from taking 1 expected input field to taking 251 input fields
>  # The {{MyProject}} node expected to be applied to {{MyTableScan(table=[[MySchema, t3]])}} is missing from the observed plan
>  # Issue was observed after upgrading from 1.24 to 1.29, so may affect one or more intervening releases
>  # PR containing reproducing unit test: https://github.com/apache/calcite/pull/2747



--
This message was sent by Atlassian Jira
(v8.20.1#820001)