You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Benchao Li (Jira)" <ji...@apache.org> on 2023/05/05 02:16:00 UTC
[jira] [Resolved] (CALCITE-5680) Wrong plan for multiple IN sub-queries with only literal operands
[ https://issues.apache.org/jira/browse/CALCITE-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benchao Li resolved CALCITE-5680.
---------------------------------
Fix Version/s: 1.35.0
Resolution: Fixed
Fixed via [https://github.com/apache/calcite/commit/8a2c680b963261b30323f9babbcc60be1dca305d]
[~Runking] Thanks for your contribution! And [~FrankZou] [~jiajunbernoulli] thanks for the review!
> Wrong plan for multiple IN sub-queries with only literal operands
> -----------------------------------------------------------------
>
> Key: CALCITE-5680
> URL: https://issues.apache.org/jira/browse/CALCITE-5680
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.34.0
> Reporter: Runkang He
> Assignee: Runkang He
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.35.0
>
>
> When the query contains multiple IN sub-queries with only literal operands, and connected with OR predicate in WHERE clause, the result is wrong. The minimal reproducer is below:
> SQL:
> {code:sql}
> select empno from sales.empnullables
> where 1 in (
> select deptno from sales.deptnullables where name = 'dept1')
> or 2 in (
> select deptno from sales.deptnullables where name = 'dept2')
> {code}
> The Plan generated by calcite master branch: (Notice the bold part of IS NULL(*$2*) in the downstream LogicalFilter)
> {code:sql}
> LogicalProject(EMPNO=[$0])
> LogicalFilter(condition=[OR(CASE(IS NULL($2), false, =($1, false), null:BOOLEAN, IS NOT NULL($1), true, false), CASE(IS NULL(*$2*), false, =($1, false), null:BOOLEAN, IS NOT NULL($1), true, false))])
> LogicalJoin(condition=[true], joinType=[left])
> LogicalJoin(condition=[true], joinType=[left])
> LogicalProject(EMPNO=[$0])
> LogicalTableScan(table=[[CATALOG, SALES, EMPNULLABLES]])
> LogicalSort(sort0=[$0], dir0=[DESC], fetch=[1])
> LogicalAggregate(group=[{0}], c=[COUNT()])
> LogicalProject(cs=[IS NOT NULL($0)])
> LogicalFilter(condition=[OR(=(1, $0), IS NULL($0))])
> LogicalProject(DEPTNO=[$0])
> LogicalFilter(condition=[=($1, 'dept1')])
> LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
> LogicalSort(sort0=[$0], dir0=[DESC], fetch=[1])
> LogicalAggregate(group=[{0}], c=[COUNT()])
> LogicalProject(cs=[IS NOT NULL($0)])
> LogicalFilter(condition=[OR(=(2, $0), IS NULL($0))])
> LogicalProject(DEPTNO=[$0])
> LogicalFilter(condition=[=($1, 'dept2')])
> LogicalTableScan(table=[[CATALOG, SALES, DEPTNULLABLES]])
> {code}
> The wrong part is that when build the downstream LogicalFilter for the two sub-queries, the filter for the second sub-query is CASE(IS NULL($2), false, =($1, false), null:BOOLEAN, IS NOT NULL($1), true, false), notice that *$2 should be the second sub-query's intermediate table field dt.c(which field index is $4), and $1 should be the second sub-query's intermediate table field dt.cs(which field index is $3), but now the actual reference is the first sub-query's, this leads to wrong plan, and wrong result*.
> The root cause is that intermediate table's field name is the same as the previous sub-query's, and when lookup intermediate table field, it always returns the previous one which is not belong to the current subquery.
> To fix this issue, we should use both different intermediate table name and field name to identify the reference correctly. (Note that we make intermediate table name different in [Calcite-5655|https://issues.apache.org/jira/browse/CALCITE-5655])
--
This message was sent by Atlassian Jira
(v8.20.10#820010)