You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Konstantin Orlov <ko...@gridgain.com> on 2021/09/17 12:43:50 UTC

Inconsistency of javadoc and actual behaviour of RelNode#getVariablesSet

Hi, folks

I have a question about org.apache.calcite.rel.RelNode#getVariablesSet.
Javadoc says, it returns variables that are set by current node:

  /**
   * Returns the variables that are set in this relational
   * expression but also used and therefore not available to parents of this
   * relational expression.
   *
   * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
   * variables.
   *
   * @return Names of variables which are set in this relational
   *   expression
   */
  Set<CorrelationId> getVariablesSet();


But I've got a plan where node returns all variables used by children nodes
regardless this variable are set by current or parent node.

Original query is:

SELECT *
  FROM t1 as "outer"
 WHERE a > (
       SELECT COUNT(*)
         FROM t1 as "inner"
        WHERE "inner".a IN (
              SELECT * 
                FROM table(system_range("inner".a, "inner".b + "outer".b))
        )
 )

After SQL to Rel translation I've got plan as follow:

LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
  LogicalFilter(condition=[>($2, $SCALAR_QUERY({
        LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
          LogicalFilter(condition=[IN($2, {
                LogicalProject(X=[$0])
                  LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
                })], variablesSet=[[$cor0]])
            LogicalTableScan(table=[[PUBLIC, T1]])
        }))], variablesSet=[[$cor2]])
    LogicalTableScan(table=[[PUBLIC, T1]])

Every LogicalFilter introduce its own correlation variable, and everything is
OK so far.

But then I apply SubQueryRemoveRule and new plan looks like this:

LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
  LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
    LogicalFilter(condition=[>($2, $7)])
      LogicalCorrelate(correlation=[$cor2], joinType=[left], requiredColumns=[{3}])
        LogicalTableScan(table=[[PUBLIC, T1]])
        LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
          LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
            LogicalJoin(condition=[=($2, $7)], joinType=[inner])
              LogicalTableScan(table=[[PUBLIC, T1]])
              LogicalAggregate(group=[{0}])
                LogicalProject(X=[$0])
                  LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])


At this point LogicalJoin.getVariablesSet() returns both "cor0" and "cor2"
variables which doesn't seem right.

Is such behaviour expected or it is a bug?

-- 
Regards,
Konstantin Orlov



Re: Inconsistency of javadoc and actual behaviour of RelNode#getVariablesSet

Posted by Konstantin Orlov <ko...@gridgain.com>.
Hi, Julian

Thanks for your reply!

> I believe that variables can only be set in the current RelNode. (Read a row from input, set the variable, then evaluate a Rex expression or restart the right input. It’s like a ‘for’ loop.) 

Actually, this is in line with my expectations. So, I'll file a ticket for this.

-- 
Regards,
Konstantin Orlov




> On 17 Sep 2021, at 21:44, Julian Hyde <jh...@gmail.com> wrote:
> 
> The sentence "Note: only {@link org.apache.calcite.rel.core.Correlate} should set variables.” is no longer true, now we added correlated Filter and, I believe, correlated Project. Maybe we should also add correlated Join (in case the ON clause uses correlated variables).
> 
> I believe that variables can only be set in the current RelNode. (Read a row from input, set the variable, then evaluate a Rex expression or restart the right input. It’s like a ‘for’ loop.) In which case, what you are seeing is wrong. But I’m not 100% sure.
> 
> Note that, unlike *set*, variables can be *used* anywhere within a tree (generally in the right-hand input of a Correlate).
> 
> Maybe you could propose better javadoc. That is worth doing independent of any bugs that you are trying to fix.
> 
> Julian
> 
> 
>> On Sep 17, 2021, at 5:43 AM, Konstantin Orlov <ko...@gridgain.com> wrote:
>> 
>> Hi, folks
>> 
>> I have a question about org.apache.calcite.rel.RelNode#getVariablesSet.
>> Javadoc says, it returns variables that are set by current node:
>> 
>> /**
>>  * Returns the variables that are set in this relational
>>  * expression but also used and therefore not available to parents of this
>>  * relational expression.
>>  *
>>  * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
>>  * variables.
>>  *
>>  * @return Names of variables which are set in this relational
>>  *   expression
>>  */
>> Set<CorrelationId> getVariablesSet();
>> 
>> 
>> But I've got a plan where node returns all variables used by children nodes
>> regardless this variable are set by current or parent node.
>> 
>> Original query is:
>> 
>> SELECT *
>> FROM t1 as "outer"
>> WHERE a > (
>>      SELECT COUNT(*)
>>        FROM t1 as "inner"
>>       WHERE "inner".a IN (
>>             SELECT * 
>>               FROM table(system_range("inner".a, "inner".b + "outer".b))
>>       )
>> )
>> 
>> After SQL to Rel translation I've got plan as follow:
>> 
>> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>> LogicalFilter(condition=[>($2, $SCALAR_QUERY({
>>       LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
>>         LogicalFilter(condition=[IN($2, {
>>               LogicalProject(X=[$0])
>>                 LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
>>               })], variablesSet=[[$cor0]])
>>           LogicalTableScan(table=[[PUBLIC, T1]])
>>       }))], variablesSet=[[$cor2]])
>>   LogicalTableScan(table=[[PUBLIC, T1]])
>> 
>> Every LogicalFilter introduce its own correlation variable, and everything is
>> OK so far.
>> 
>> But then I apply SubQueryRemoveRule and new plan looks like this:
>> 
>> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>> LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>>   LogicalFilter(condition=[>($2, $7)])
>>     LogicalCorrelate(correlation=[$cor2], joinType=[left], requiredColumns=[{3}])
>>       LogicalTableScan(table=[[PUBLIC, T1]])
>>       LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
>>         LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>>           LogicalJoin(condition=[=($2, $7)], joinType=[inner])
>>             LogicalTableScan(table=[[PUBLIC, T1]])
>>             LogicalAggregate(group=[{0}])
>>               LogicalProject(X=[$0])
>>                 LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
>> 
>> 
>> At this point LogicalJoin.getVariablesSet() returns both "cor0" and "cor2"
>> variables which doesn't seem right.
>> 
>> Is such behaviour expected or it is a bug?
>> 
>> -- 
>> Regards,
>> Konstantin Orlov
>> 
>> 
> 


Re: Inconsistency of javadoc and actual behaviour of RelNode#getVariablesSet

Posted by Julian Hyde <jh...@gmail.com>.
The sentence "Note: only {@link org.apache.calcite.rel.core.Correlate} should set variables.” is no longer true, now we added correlated Filter and, I believe, correlated Project. Maybe we should also add correlated Join (in case the ON clause uses correlated variables).

I believe that variables can only be set in the current RelNode. (Read a row from input, set the variable, then evaluate a Rex expression or restart the right input. It’s like a ‘for’ loop.) In which case, what you are seeing is wrong. But I’m not 100% sure.

Note that, unlike *set*, variables can be *used* anywhere within a tree (generally in the right-hand input of a Correlate).

Maybe you could propose better javadoc. That is worth doing independent of any bugs that you are trying to fix.

Julian


> On Sep 17, 2021, at 5:43 AM, Konstantin Orlov <ko...@gridgain.com> wrote:
> 
> Hi, folks
> 
> I have a question about org.apache.calcite.rel.RelNode#getVariablesSet.
> Javadoc says, it returns variables that are set by current node:
> 
>  /**
>   * Returns the variables that are set in this relational
>   * expression but also used and therefore not available to parents of this
>   * relational expression.
>   *
>   * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
>   * variables.
>   *
>   * @return Names of variables which are set in this relational
>   *   expression
>   */
>  Set<CorrelationId> getVariablesSet();
> 
> 
> But I've got a plan where node returns all variables used by children nodes
> regardless this variable are set by current or parent node.
> 
> Original query is:
> 
> SELECT *
>  FROM t1 as "outer"
> WHERE a > (
>       SELECT COUNT(*)
>         FROM t1 as "inner"
>        WHERE "inner".a IN (
>              SELECT * 
>                FROM table(system_range("inner".a, "inner".b + "outer".b))
>        )
> )
> 
> After SQL to Rel translation I've got plan as follow:
> 
> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>  LogicalFilter(condition=[>($2, $SCALAR_QUERY({
>        LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
>          LogicalFilter(condition=[IN($2, {
>                LogicalProject(X=[$0])
>                  LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
>                })], variablesSet=[[$cor0]])
>            LogicalTableScan(table=[[PUBLIC, T1]])
>        }))], variablesSet=[[$cor2]])
>    LogicalTableScan(table=[[PUBLIC, T1]])
> 
> Every LogicalFilter introduce its own correlation variable, and everything is
> OK so far.
> 
> But then I apply SubQueryRemoveRule and new plan looks like this:
> 
> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>  LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>    LogicalFilter(condition=[>($2, $7)])
>      LogicalCorrelate(correlation=[$cor2], joinType=[left], requiredColumns=[{3}])
>        LogicalTableScan(table=[[PUBLIC, T1]])
>        LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
>          LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>            LogicalJoin(condition=[=($2, $7)], joinType=[inner])
>              LogicalTableScan(table=[[PUBLIC, T1]])
>              LogicalAggregate(group=[{0}])
>                LogicalProject(X=[$0])
>                  LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
> 
> 
> At this point LogicalJoin.getVariablesSet() returns both "cor0" and "cor2"
> variables which doesn't seem right.
> 
> Is such behaviour expected or it is a bug?
> 
> -- 
> Regards,
> Konstantin Orlov
> 
>