You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2023/04/21 00:20:00 UTC
[jira] [Commented] (CALCITE-5655) Intermediate table alias should be different to avoid wrong field reference lookup in subquery remove phase
[ https://issues.apache.org/jira/browse/CALCITE-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714776#comment-17714776 ]
Julian Hyde commented on CALCITE-5655:
--------------------------------------
[~Runking], Thanks for this fix; nice work.
I saw you changed the commit message and Jira subject. For what it's worth, I preferred the original subject, because it described the problem encountered by the SQL user; the new subject describes the cause - i.e. how you changed the code to fix the problem - but in my opinion, it's better to lead with the 'symptoms' rather than the 'cause'.
> Intermediate table alias should be different to avoid wrong field reference lookup in subquery remove phase
> -----------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-5655
> URL: https://issues.apache.org/jira/browse/CALCITE-5655
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.34.0
> Reporter: Runkang He
> Assignee: Runkang He
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.35.0
>
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> When the query contains multiple IN/SOME sub-queries connected with OR predicate in WHERE clause, the result is wrong. The minimal reproducer is below:
> SQL:
> {code:sql}
> select empno from sales.empnullables
> where deptno in (
> select deptno from sales.deptnullables where name = 'dept1')
> or deptno in (
> select deptno from sales.deptnullables where name = 'dept2')
> {code}
> The Plan generated by calcite master branch: (Notice the bold part of *<>($2, 0)* in the downstream LogicalFilter)
> {code:sql}
> LogicalProject(EMPNO=[$0])
> LogicalProject(EMPNO=[$0], DEPTNO=[$1])
> LogicalFilter(condition=[OR(AND(<>($2, 0), IS NOT NULL($5), IS NOT NULL($1)), AND(***<>($2, 0)***, IS NOT NULL($9), IS NOT NULL($1)))])
> LogicalJoin(condition=[=($1, $8)], joinType=[left])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalJoin(condition=[=($1, $4)], joinType=[left])
> LogicalJoin(condition=[true], joinType=[inner])
> LogicalProject(EMPNO=[$0], DEPTNO=[$7])
> LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
> LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
> LogicalProject(DEPTNO=[$0])
> LogicalFilter(condition=[=($1, 'dept1')])
> LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
> LogicalProject(DEPTNO=[$0], i=[true])
> LogicalFilter(condition=[=($1, 'dept1')])
> LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
> LogicalAggregate(group=[{}], c=[COUNT()], ck=[COUNT($0)])
> LogicalProject(DEPTNO=[$0])
> LogicalFilter(condition=[=($1, 'dept2')])
> LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
> LogicalProject(DEPTNO=[$0], i=[true])
> LogicalFilter(condition=[=($1, 'dept2')])
> LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
> {code}
> The wrong part is that when build the downstream LogicalFilter for the two sub-queries, the filter for the second sub-query is AND(<>($2, 0), IS NOT NULL($9), IS NOT NULL($1)), notice that *$2 should be the second sub-query's intermediate table field ct.c(which field index is $6), but now the actual reference is the first sub-query's*, this leads to wrong plan, and wrong result.
> The root cause is that intermediate table alias is the same as the previous sub-query's, but when lookup intermediate table field, it always returns the previous one which is not belong to the current subquery.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)