You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Jess Balint (Jira)" <ji...@apache.org> on 2023/02/07 22:34:01 UTC

[jira] [Closed] (CALCITE-5209) Proper sub-query handling if it is used inside select list and group by

     [ https://issues.apache.org/jira/browse/CALCITE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jess Balint closed CALCITE-5209.
--------------------------------

Resolved in release 1.33.0 (2023-02-06)

> Proper sub-query handling if it is used inside select list and group by
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-5209
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5209
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.30.0
>            Reporter: Ian Bertolacci
>            Assignee: Dmitry Sysolyatin
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.33.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> SqlRelConverter.getSubQuery uses reference comparing in order to find SubQuery instance. 
> But it does not work in case when select list has column which refers to the column inside `GROUP BY` clause.
> For example: 
> {code}SELECT deptno IN (1,2) FROM emp.deptno GROUP BY deptno IN (1,2);{code}
> In this case `SqlNode` inside select list and `SqlNode` inside `GROUP BY` are different SqlNode instances. The reference comparing won't work.
> The following bug is a result of this behaviour:
> For these examples assume that:
>  - {{`SqlToRelConverter.Config.config().withInSubQueryThreshold(5)`}} has been set.
>  - The source table has 4 columns (Column_0 through Column_3)
>  - Each column is of type BIGINT NOT NULL
> The failing query is:
> {code:java}
> select
>     case
>         when Column_3 in (1, 2, 3, 4, 5) THEN 1
>         else 0
>     end
> from T1000
> group by
>     case
>         when Column_3 in (1, 2, 3, 4, 5) THEN 1
>         else 0
>     end
> {code}
> The exception is:
> {code:java}
> 3
> java.lang.ArrayIndexOutOfBoundsException: 3
> 	at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:75)
> 	at org.apache.calcite.tools.RelBuilder.inferAlias(RelBuilder.java:2163)
> 	at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1956)
> 	at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1797)
> 	at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1769)
> 	at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1758)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.register(SqlToRelConverter.java:4680)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1268)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1127)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:3325)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:3192)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:738)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:664)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3589)
> 	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:589)
> 	at org.apache.calcite.prepare.PlannerImpl.rel(PlannerImpl.java:259)
> {code}
> The table could contain N >= 4 columns, and the use of any column at or after index 3 will also cause this exception.
> The use of any column before index 3 gives the RelNode tree:
> {code:java}
> 66:LogicalProject(EXPR$0=[$0])
>   65:LogicalJoin(condition=[=($3, $4)], joinType=[left])
>     61:LogicalProject(EXPR$0=[$0], $f0=[$1], $f1=[$2], $f10=[$2])
>       60:LogicalJoin(condition=[true], joinType=[inner])
>         56:LogicalAggregate(group=[{0}])
>           55:LogicalProject(EXPR$0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
>             54:LogicalJoin(condition=[=($6, $7)], joinType=[left])
>               50:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_20=[$2])
>                 49:LogicalJoin(condition=[true], joinType=[inner])
>                   45:TableScan(...)
>                   48:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>                     47:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>                       46:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
>               53:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>                 52:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>                   51:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
>         59:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>           58:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>             57:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
>     64:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>       63:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>         62:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}
> If the number of predicates is less than the subquery threshold, there are no issues.
> Additionally, the below queries also produce RelNode trees
> Using an alias:
> {code:java}
> select
>     case
>         when Column_3 in (1, 2, 3, 4, 5) THEN 1
>         else 0
>     end as CASE_ALIAS
> from T1000
> group by
>     CASE_ALIAS
> {code}
> {code:java}
> 90:LogicalAggregate(group=[{0}])
>   89:LogicalProject(CASE_ALIAS=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
>     88:LogicalJoin(condition=[=($6, $7)], joinType=[left])
>       84:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
>         83:LogicalJoin(condition=[true], joinType=[inner])
>           79:TableScan(...)
>           82:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>             81:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>               80:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
>       87:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>         86:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>           85:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}
> Not using the group-by value in the projection:
> {code:java}
> select
>     count(*)
> from T1000
> group by
>     case
>         when Column_3 in (1, 2, 3, 4, 5) THEN 1
>         else 0
>     end
> {code}
> {code:java}
> 116:LogicalProject(EXPR$0=[$1])
>   115:LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
>     114:LogicalProject($f0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
>       113:LogicalJoin(condition=[=($6, $7)], joinType=[left])
>         109:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
>           108:LogicalJoin(condition=[true], joinType=[inner])
>             104:TableScan(...)
>             107:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>               106:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>                 105:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
>         112:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>           111:LogicalProject(ROW_VALUE=[$0], $f1=[true])
>             110:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)