You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Danny Chan (JIRA)" <ji...@apache.org> on 2019/04/17 10:21:00 UTC

[jira] [Issue Comment Deleted] (CALCITE-2948) SqlToRelConverter generates complicated logical plan for in subquery with non-equi condition

     [ https://issues.apache.org/jira/browse/CALCITE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Danny Chan updated CALCITE-2948:
--------------------------------
    Comment: was deleted

(was: Hi [~hyuan] [~zabetak] [~julianhyde] [~vgarg]  I have applied a [PR#1169|[https://github.com/apache/calcite/pull/1169]]

The main diff it to add 2 rules in post-decorrelation phrase: SemiJoinRule.PROJECT and SemiJoinRemoveRule.PROJECT.
The original plan after decorrelation is:
{code:java}
LogicalProject(DEPTNO=[$7])
  LogicalJoin(condition=[AND(=($0, $10), =($7, $9))], joinType=[inner])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalAggregate(group=[{0, 1}])
      LogicalProject(DEPTNO=[$7], EMPNO0=[$9])
        LogicalJoin(condition=[=($0, +($9, 1))], joinType=[inner])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]])
          LogicalProject(EMPNO=[$0])
            LogicalTableScan(table=[[CATALOG, SALES, EMP]]){code}
The plan is very same with what SubQueryRemoveRule produces. To reduce the right agg of the above join, we can use rule SemiJoinRule.PROJECT[1], then plan will change to:
{code:java}
LogicalProject(DEPTNO=[$7])
  SemiJoin(condition=[AND(=($0, $10), =($7, $9))], joinType=[inner])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalProject(DEPTNO=[$7], EMPNO0=[$9])
      LogicalJoin(condition=[=($0, +($9, 1))], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalProject(EMPNO=[$0])
          LogicalTableScan(table=[[CATALOG, SALES, EMP]]){code}
Add after applying SemiJoinRemoveRule.PROJECT will get the final plan:
{code:java}
LogicalProject(DEPTNO=[$7])
  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
    LogicalJoin(condition=[=($0, +($9, 1))], joinType=[inner])
      LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalProject(EMPNO=[$0])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]]){code}
There are 2 points that i want to discuss:
1. The new added rule SemiJoinRemoveRule.PROJECT and SemiJoinRemoveRule.Join only match both the joins left sides as a TableScan, which i think could be more common, e.g. a RelNode, but i only decide if the 2 are equals with each other with the node's digest, i'm not very confident if i change to RelNode it would be still correct.
2. In order to remove the above join, the join keys of it must all be unique, or the expanded join rows of same keys would be lost. But now when we use SubqueryRemoveRule or RelDecorrelator to decorrelate the in subquery, this info is still ignored when we build a join(the join rows would swell), so i ignore the uniqueniess either as default for SemiJoinRemoveRule, but i still add a flag in it to let user control the behavior.
[1] https://github.com/apache/calcite/blob/b03cdc486cf5c7232bbc6fa9b5f02f564e9601c3/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L121)

> SqlToRelConverter generates complicated logical plan for in subquery with non-equi condition
> --------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2948
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2948
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Haisheng Yuan
>            Assignee: Danny Chan
>            Priority: Major
>              Labels: pull-request-available, sub-query
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Repro:
> Add the following test to SqlToRelConverterTest.java.
> {code:java}
> @Test public void testSubQueryIN() {
>     final String sql = "select deptno\n"
>         + "from EMP e\n"
>         + "where deptno in (select deptno\n"
>         + "from EMP where empno=e.empno+1)";
>     sql(sql).ok();
>   }
> {code}
> Plan:
> {code:java}
> LogicalProject(DEPTNO=[$7])
>   LogicalJoin(condition=[AND(=($0, $10), =($7, $9))], joinType=[inner])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalAggregate(group=[{0, 1}])
>       LogicalProject(DEPTNO=[$7], EMPNO0=[$9])
>         LogicalJoin(condition=[=($0, +($9, 1))], joinType=[inner])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>           LogicalProject(EMPNO=[$0])
>             LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> One join would suffice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)