You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Vineet Garg (JIRA)" <ji...@apache.org> on 2017/06/02 22:00:06 UTC

[jira] [Commented] (CALCITE-1494) Inefficient plan for correlated sub-queries

    [ https://issues.apache.org/jira/browse/CALCITE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035498#comment-16035498 ] 

Vineet Garg commented on CALCITE-1494:
--------------------------------------

Hi [~julianhyde]
Do you remember why were this optimization done only for equality predicates? e.g. {{findCorrelationEquivalent}} here [LINK to diff | https://git1-us-west.apache.org/repos/asf?p=calcite.git;a=blobdiff;f=core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java;h=18871e1500785f244c0a06b06f42f469ae4ad040;hp=0e6bd6ac8a17944495a3dc867b5ac71697c74352;hb=73e437fe;hpb=052f854594f92fd52a142c53998b83d70302075b] skips if Rex Node is not of {{EQUAL}} type. We end up with value generator (join with outer query) for queries which have non-equality correlated predicates e.g. {{select sal from emp where empno IN (select deptno from dept where emp.job <> dept.name}}. Should it be safe to optimize this and not generate value generator for non-equality predicates as well?

> Inefficient plan for correlated sub-queries
> -------------------------------------------
>
>                 Key: CALCITE-1494
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1494
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Vineet Garg
>            Assignee: Julian Hyde
>              Labels: sub-query
>             Fix For: 1.12.0
>
>
> For co-related queries such as 
> {noformat} select sal from emp where empno IN (select deptno from dept where emp.job = dept.name) {noformat}
> Calcite generates following plan (SubqueryRemove Rule + Decorrelation) 
> {noformat}
> LogicalProject(SAL=[$5])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalJoin(condition=[AND(=($2, $10), =($0, $9))], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0, 1}])
>         LogicalProject(DEPTNO=[$0], JOB=[$1])
>           LogicalProject(DEPTNO=[$0], JOB=[$2])
>             LogicalJoin(condition=[=($2, $1)], joinType=[inner])
>               LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>               LogicalAggregate(group=[{0}])
>                 LogicalProject(JOB=[$2])
>                   LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {noformat}
> As you can notice there is a outer table scan (EMP in this case) to retrieve all distinct values for co-related column (EMP.JOB here), which is then joined with inner table (DEPT). 
> I am not sure why is this step required. After this join Calcite is anyway doing group by to generate all distinct values for correlated and result column (DEPTNO, JOB) which is then joined with outer table. 
> I think the scan + join of outer table with inner table to generate co-rrelated values is un-necessary and is not required.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)