You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Ruben Q L (Jira)" <ji...@apache.org> on 2023/06/20 18:11:00 UTC

[jira] [Updated] (CALCITE-5789) Query with two nested subqueries where the inner-most references the outer-most table returns wrong result

     [ https://issues.apache.org/jira/browse/CALCITE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruben Q L updated CALCITE-5789:
-------------------------------
    Description: 
Problem can be reproduced with the following query (to be added e.g. in sub-query.iq):
{code:sql}
select deptno from dept d1 where exists (
 select 1 from dept d2 where d2.deptno = d1.deptno and exists (
  select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
{code}

The problem appears with at least two nested subqueries, when the inner most references the outermost table, in our case {{and d3.dname = d1.dname}} (if we remove this expression, the problem does not occur).

When the above query is processed, the following plan is generated (notice how the top-most projection contains two fields instead of one):
{noformat}
// Plan2 (Decorrelator output)
LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
  LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
    LogicalProject(DEPTNO=[$0], DNAME=[$1])
      LogicalTableScan(table=[[scott, DEPT]])
    LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
      LogicalJoin(condition=[true], joinType=[inner])
        LogicalTableScan(table=[[scott, DEPT]])
        LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
          LogicalFilter(condition=[IS NOT NULL($1)])
            LogicalTableScan(table=[[scott, DEPT]])
{noformat}

Even if this minimalist test does not fail, it leads to a situation where the {{RelRoot#validatedRowType}} and the {{RelRoot#rel#rowType}} don't match, which can lead to unforeseeable consequences (for the record, I have seen more complex queries which do fail at execution time because of this issue).

The culprit that generates the unexpected extra field in the final projection is RelDecorrelator, however I think the decorrelator is not to blame here, because the input that reaches it is already wrong:
{noformat}
Plan1 (SubQueryRemoveRule output, Decorrelator input)
LogicalProject(DEPTNO=[$0])
  LogicalProject(DEPTNO=[$0], DNAME=[$1])
    LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0, 1}])
      LogicalProject(DEPTNO=[$0], DNAME=[$1])
        LogicalTableScan(table=[[scott, DEPT]])
      LogicalAggregate(group=[{0}])
        LogicalProject(i=[true])
          LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
            LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
              LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor1, $cor0]])
                LogicalTableScan(table=[[scott, DEPT]])
                LogicalAggregate(group=[{0}])
                  LogicalProject(i=[true])
                    LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
                      LogicalTableScan(table=[[scott, DEPT]])
{noformat}


 


  was:tbd


> Query with two nested subqueries where the inner-most references the outer-most table returns wrong result
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5789
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5789
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Ruben Q L
>            Priority: Major
>
> Problem can be reproduced with the following query (to be added e.g. in sub-query.iq):
> {code:sql}
> select deptno from dept d1 where exists (
>  select 1 from dept d2 where d2.deptno = d1.deptno and exists (
>   select 1 from dept d3 where d3.deptno = d2.deptno and d3.dname = d1.dname));
> {code}
> The problem appears with at least two nested subqueries, when the inner most references the outermost table, in our case {{and d3.dname = d1.dname}} (if we remove this expression, the problem does not occur).
> When the above query is processed, the following plan is generated (notice how the top-most projection contains two fields instead of one):
> {noformat}
> // Plan2 (Decorrelator output)
> LogicalProject(DEPTNO=[$0], DEPTNO0=[$4])
>   LogicalJoin(condition=[AND(=($0, $2), =($1, $3))], joinType=[inner])
>     LogicalProject(DEPTNO=[$0], DNAME=[$1])
>       LogicalTableScan(table=[[scott, DEPT]])
>     LogicalProject(DEPTNO3=[$0], DNAME0=[$3], DEPTNO0=[$4], $f3=[true])
>       LogicalJoin(condition=[true], joinType=[inner])
>         LogicalTableScan(table=[[scott, DEPT]])
>         LogicalProject(DNAME=[$1], DEPTNO=[$0], $f2=[true])
>           LogicalFilter(condition=[IS NOT NULL($1)])
>             LogicalTableScan(table=[[scott, DEPT]])
> {noformat}
> Even if this minimalist test does not fail, it leads to a situation where the {{RelRoot#validatedRowType}} and the {{RelRoot#rel#rowType}} don't match, which can lead to unforeseeable consequences (for the record, I have seen more complex queries which do fail at execution time because of this issue).
> The culprit that generates the unexpected extra field in the final projection is RelDecorrelator, however I think the decorrelator is not to blame here, because the input that reaches it is already wrong:
> {noformat}
> Plan1 (SubQueryRemoveRule output, Decorrelator input)
> LogicalProject(DEPTNO=[$0])
>   LogicalProject(DEPTNO=[$0], DNAME=[$1])
>     LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{0, 1}])
>       LogicalProject(DEPTNO=[$0], DNAME=[$1])
>         LogicalTableScan(table=[[scott, DEPT]])
>       LogicalAggregate(group=[{0}])
>         LogicalProject(i=[true])
>           LogicalProject(DEPTNO=[$0], DNAME=[$1], LOC=[$2])
>             LogicalFilter(condition=[=($0, $cor0.DEPTNO)])
>               LogicalJoin(condition=[true], joinType=[inner], variablesSet=[[$cor1, $cor0]])
>                 LogicalTableScan(table=[[scott, DEPT]])
>                 LogicalAggregate(group=[{0}])
>                   LogicalProject(i=[true])
>                     LogicalFilter(condition=[AND(=($0, $cor1.DEPTNO), =($1, $cor0.DNAME))])
>                       LogicalTableScan(table=[[scott, DEPT]])
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)