You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Vineet Garg (JIRA)" <ji...@apache.org> on 2019/01/30 22:41:00 UTC
[jira] [Commented] (CALCITE-2329) Enhance SubQueryRemoveRule to
rewrite IN operator with the constant from the left side more optimally
[ https://issues.apache.org/jira/browse/CALCITE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756630#comment-16756630 ]
Vineet Garg commented on CALCITE-2329:
--------------------------------------
Sorry to bring up the old issues but I just noticed it. Following plan which is posted in the jira for IN doesn't look correct to me
{noformat}
LogicalProject(SAL=[$5])
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
LogicalJoin(condition=[=(2, $9)], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalProject(DEPTNO=[$0]) <==========missing LogicalAggregate
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
{noformat}
This plan is missing LogicalAggregate on the right side (on subquery). This plan as it will produce incorrect result. The correct plan will have a LogicalAggregate (group by deptno) which should further be optimized to be {{group by <constant>}} due to deptno=2 filter. As a result this plan should have actually been equivalent to plan with EXISTS in the first place.
> Enhance SubQueryRemoveRule to rewrite IN operator with the constant from the left side more optimally
> -----------------------------------------------------------------------------------------------------
>
> Key: CALCITE-2329
> URL: https://issues.apache.org/jira/browse/CALCITE-2329
> Project: Calcite
> Issue Type: Improvement
> Reporter: Volodymyr Vysotskyi
> Assignee: Julian Hyde
> Priority: Major
> Fix For: 1.17.0
>
>
> Currently, for queries like this:
> {code:sql}
> select sal from emp where 2 IN (select deptno from dept)
> {code}
> SubQueryRemoveRule rules expand query plan in such a way:
> {noformat}
> LogicalProject(SAL=[$5])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
> LogicalJoin(condition=[=(2, $9)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(DEPTNO=[$0])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {noformat}
> Since join condition is actually a filter condition, it will be pushed into the filter during further planning stages and join with the true condition is left.
> But these types of the queries may be rewritten in the same way as EXISTS queries:
> {code:sql}
> select sal from emp where EXISTS (select deptno from dept where deptno=2)
> {code}
> with the more optimal plan:
> {noformat}
> LogicalProject(SAL=[$5])
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalAggregate(group=[{0}])
> LogicalProject(i=[true])
> LogicalFilter(condition=[=($0, 2)])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)