You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Jess Balint (Jira)" <ji...@apache.org> on 2023/02/07 22:34:01 UTC
[jira] [Closed] (CALCITE-5209) Proper sub-query handling if it is used inside select list and group by
[ https://issues.apache.org/jira/browse/CALCITE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jess Balint closed CALCITE-5209.
--------------------------------
Resolved in release 1.33.0 (2023-02-06)
> Proper sub-query handling if it is used inside select list and group by
> -----------------------------------------------------------------------
>
> Key: CALCITE-5209
> URL: https://issues.apache.org/jira/browse/CALCITE-5209
> Project: Calcite
> Issue Type: Bug
> Affects Versions: 1.30.0
> Reporter: Ian Bertolacci
> Assignee: Dmitry Sysolyatin
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.33.0
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> SqlRelConverter.getSubQuery uses reference comparing in order to find SubQuery instance.
> But it does not work in case when select list has column which refers to the column inside `GROUP BY` clause.
> For example:
> {code}SELECT deptno IN (1,2) FROM emp.deptno GROUP BY deptno IN (1,2);{code}
> In this case `SqlNode` inside select list and `SqlNode` inside `GROUP BY` are different SqlNode instances. The reference comparing won't work.
> The following bug is a result of this behaviour:
> For these examples assume that:
> - {{`SqlToRelConverter.Config.config().withInSubQueryThreshold(5)`}} has been set.
> - The source table has 4 columns (Column_0 through Column_3)
> - Each column is of type BIGINT NOT NULL
> The failing query is:
> {code:java}
> select
> case
> when Column_3 in (1, 2, 3, 4, 5) THEN 1
> else 0
> end
> from T1000
> group by
> case
> when Column_3 in (1, 2, 3, 4, 5) THEN 1
> else 0
> end
> {code}
> The exception is:
> {code:java}
> 3
> java.lang.ArrayIndexOutOfBoundsException: 3
> at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:75)
> at org.apache.calcite.tools.RelBuilder.inferAlias(RelBuilder.java:2163)
> at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1956)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1797)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1769)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1758)
> at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.register(SqlToRelConverter.java:4680)
> at org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1268)
> at org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1127)
> at org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:3325)
> at org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:3192)
> at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:738)
> at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:664)
> at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3589)
> at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:589)
> at org.apache.calcite.prepare.PlannerImpl.rel(PlannerImpl.java:259)
> {code}
> The table could contain N >= 4 columns, and the use of any column at or after index 3 will also cause this exception.
> The use of any column before index 3 gives the RelNode tree:
> {code:java}
> 66:LogicalProject(EXPR$0=[$0])
> 65:LogicalJoin(condition=[=($3, $4)], joinType=[left])
> 61:LogicalProject(EXPR$0=[$0], $f0=[$1], $f1=[$2], $f10=[$2])
> 60:LogicalJoin(condition=[true], joinType=[inner])
> 56:LogicalAggregate(group=[{0}])
> 55:LogicalProject(EXPR$0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
> 54:LogicalJoin(condition=[=($6, $7)], joinType=[left])
> 50:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_20=[$2])
> 49:LogicalJoin(condition=[true], joinType=[inner])
> 45:TableScan(...)
> 48:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> 47:LogicalProject(ROW_VALUE=[$0], $f1=[true])
> 46:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> 53:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> 52:LogicalProject(ROW_VALUE=[$0], $f1=[true])
> 51:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> 59:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> 58:LogicalProject(ROW_VALUE=[$0], $f1=[true])
> 57:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> 64:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> 63:LogicalProject(ROW_VALUE=[$0], $f1=[true])
> 62:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}
> If the number of predicates is less than the subquery threshold, there are no issues.
> Additionally, the below queries also produce RelNode trees
> Using an alias:
> {code:java}
> select
> case
> when Column_3 in (1, 2, 3, 4, 5) THEN 1
> else 0
> end as CASE_ALIAS
> from T1000
> group by
> CASE_ALIAS
> {code}
> {code:java}
> 90:LogicalAggregate(group=[{0}])
> 89:LogicalProject(CASE_ALIAS=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
> 88:LogicalJoin(condition=[=($6, $7)], joinType=[left])
> 84:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
> 83:LogicalJoin(condition=[true], joinType=[inner])
> 79:TableScan(...)
> 82:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> 81:LogicalProject(ROW_VALUE=[$0], $f1=[true])
> 80:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> 87:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> 86:LogicalProject(ROW_VALUE=[$0], $f1=[true])
> 85:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}
> Not using the group-by value in the projection:
> {code:java}
> select
> count(*)
> from T1000
> group by
> case
> when Column_3 in (1, 2, 3, 4, 5) THEN 1
> else 0
> end
> {code}
> {code:java}
> 116:LogicalProject(EXPR$0=[$1])
> 115:LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
> 114:LogicalProject($f0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
> 113:LogicalJoin(condition=[=($6, $7)], joinType=[left])
> 109:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
> 108:LogicalJoin(condition=[true], joinType=[inner])
> 104:TableScan(...)
> 107:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
> 106:LogicalProject(ROW_VALUE=[$0], $f1=[true])
> 105:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> 112:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
> 111:LogicalProject(ROW_VALUE=[$0], $f1=[true])
> 110:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)