You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2023/04/22 19:00:00 UTC

[jira] [Commented] (CALCITE-5669) Remove trivial correlates from the query plan

    [ https://issues.apache.org/jira/browse/CALCITE-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715312#comment-17715312 ] 

Julian Hyde commented on CALCITE-5669:
--------------------------------------

Generally I think it's worthwhile to expand the "correlating expression" as much as possible. (We may already do this; I haven't checked.) In this case, don't correlate on "{{e.empno}}" but instead correlate on "{{e.empno = null}}". Then that is an expression that can be more easily constant-reduced.

(One counterpoint to that: in another case (CALCITE-5644) I am dealing correlated sub-query that has "{{t.a is not distinct from u.b}}", which is expanded to "{{t.a is null and u.b is null or t.a = t.b}}", and somehow "{{t.a is null}}" ends up being computed as a separate boolean column in the left input of the {{Correlate}} operator. Thus the correlation condition is split between two operators, and it is very hard to perform further simplification. The answer in this case is probably to keep {{is not distinct from}} intact, but even so, it emphasizes the importance of choosing the right grain for correlation variables.)

But even with the best intentions we may end up with trivial {{Correlate}} operators. Eliminating variables that are not used, and converting {{Correlate}} operators with no variables into {{Join}} operators, seem to be simple and efficient rules. So, my vote would be to do that solution.

> Remove trivial correlates from the query plan
> ---------------------------------------------
>
>                 Key: CALCITE-5669
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5669
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>
> Consider the following query correlated query.
> {code:sql}
> select * from emp as e where exists (select 1 from dept as d where e.empno = null)
> {code}
> The query basically returns an empty result because {{e.empno = null}} is always false.
> The plan for the query after applying the sub-query remove rule is shown below:
> {noformat}
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0}])
>         LogicalProject(i=[true])
>           LogicalFilter(condition=[=($cor0.EMPNO, null)])
>             LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> {noformat}
> After applying the reduce expressions rule the filter with the correlated condition will become false and the resulting plan would be the following.
> {noformat}
> LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>   LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0}])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalAggregate(group=[{0}])
>         LogicalProject(i=[true])
>           LogicalValues(tuples=[[]])
> {noformat}
> Observe that now we have a {{LogicalCorrelate}} but there is no real correlation in the plan since the correlation variable on the right side disappeared. Depending on how rules are applied and which rules are used similar "trivial" correlates may appear.
> The goal of this ticket is to provide the means to get rid of them.
> One option would be to add a new rule (e.g., {{CorrelateToJoinRule}}) which detects that a correlate does not have correlations in the right side and turn the correlation to a join; then we could employ other existing rules (such as PruneEmptyRules) for joins and remove the newly created join altogether.
> Another option, would be to introduce new pruning rule(s) for correlate (similar to those for joins) that will remove the correlate when its input is an empty values expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)