You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Anton Haidai (JIRA)" <ji...@apache.org> on 2018/11/12 15:00:01 UTC

[jira] [Updated] (CALCITE-2666) JoinPushThroughJoinRule can't reach an optimal plan in some 3+ joins cases

     [ https://issues.apache.org/jira/browse/CALCITE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Haidai updated CALCITE-2666:
----------------------------------
    Description: 
For example, the input query is the following:
{code:java}
SELECT *
FROM X
INNER JOIN A
ON X.id = A.id
INNER JOIN Y
ON X.id = Y.id
INNER JOIN Z
ON X.id = Z.id
{code}
According to the cost model used, it would be beneficial to push the "A" scan to the right node of the top join (grouping X, Y, Z scans in two bottom joins in any order). But this state is never reached, "A" scan could be pushed only one join up, but never two joins up.
h2. Cause

According to my debugging, the cause of the issue is the following.

As far as the optimal state could hypothetically be achieved only by JoinPushThroughJoinRule.RIGHT, lets review only the behavior of this rule (while JoinPushThroughJoinRule.LEFT is also affected by the issue described). After each transformation, JoinPushThroughJoinRule.RIGHT not only swaps right nodes of joins, but also adds an additional project node on top of transformed joins.

The rule expects the following input structure:
{code:java}
operand(LogicalJoin.class,
    operand(LogicalJoin.class, any()),
    operand(RelNode.class, any())
)
{code}
But after applying the rule to two bottom joins, there will be an additional project between  these joins and the top join, so the middle join is no longer the left input of the top join and the rule can't match and produce the optimal result. See the attachment for a visual representation of this explanation:

!calcite.join.pushdown.issue.png|thumbnail!
 

  was:
For example, the input query is the following:

 
{code:java}
SELECT *
FROM X
INNER JOIN A
ON X.id = A.id
INNER JOIN Y
ON X.id = Y.id
INNER JOIN Z
ON X.id = Z.id
{code}
According to the cost model used, it would be beneficial to push the "A" scan to the right node of the top join (grouping X, Y, Z scans in two bottom joins in any order). But this state is never reached, "A" scan could be pushed only one join up, but never two joins up.

 
h2. Cause

According to my debugging, the cause of the issue is the following.

As far as the optimal state could hypothetically be achieved only by JoinPushThroughJoinRule.RIGHT, lets review only the behavior of this rule (while

JoinPushThroughJoinRule.LEFT is also affected by the issue described). After each transformation, JoinPushThroughJoinRule.RIGHT not only swaps right nodes of joins, but also adds an additional project node on top of transformed joins.

The rule expects the following input structure:
{code:java}
operand(LogicalJoin.class,
    operand(LogicalJoin.class, any()),
    operand(RelNode.class, any())
)
{code}
But after applying the rule to two bottom joins, there will be an additional project between  these joins and the top join, so the middle join is no longer the left input of the top join and the rule can't match and produce the optimal result. See the attachment for a visual representation of this explanation:

!calcite.join.pushdown.issue.png!

 

 
h2.  

 


> JoinPushThroughJoinRule can't reach an optimal plan in some 3+ joins cases
> --------------------------------------------------------------------------
>
>                 Key: CALCITE-2666
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2666
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Anton Haidai
>            Assignee: Julian Hyde
>            Priority: Major
>         Attachments: calcite.join.pushdown.issue.png
>
>
> For example, the input query is the following:
> {code:java}
> SELECT *
> FROM X
> INNER JOIN A
> ON X.id = A.id
> INNER JOIN Y
> ON X.id = Y.id
> INNER JOIN Z
> ON X.id = Z.id
> {code}
> According to the cost model used, it would be beneficial to push the "A" scan to the right node of the top join (grouping X, Y, Z scans in two bottom joins in any order). But this state is never reached, "A" scan could be pushed only one join up, but never two joins up.
> h2. Cause
> According to my debugging, the cause of the issue is the following.
> As far as the optimal state could hypothetically be achieved only by JoinPushThroughJoinRule.RIGHT, lets review only the behavior of this rule (while JoinPushThroughJoinRule.LEFT is also affected by the issue described). After each transformation, JoinPushThroughJoinRule.RIGHT not only swaps right nodes of joins, but also adds an additional project node on top of transformed joins.
> The rule expects the following input structure:
> {code:java}
> operand(LogicalJoin.class,
>     operand(LogicalJoin.class, any()),
>     operand(RelNode.class, any())
> )
> {code}
> But after applying the rule to two bottom joins, there will be an additional project between  these joins and the top join, so the middle join is no longer the left input of the top join and the rule can't match and produce the optimal result. See the attachment for a visual representation of this explanation:
> !calcite.join.pushdown.issue.png|thumbnail!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)