You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Konstantin Orlov <ko...@gridgain.com> on 2021/09/17 12:43:50 UTC
Inconsistency of javadoc and actual behaviour of
RelNode#getVariablesSet
Hi, folks
I have a question about org.apache.calcite.rel.RelNode#getVariablesSet.
Javadoc says, it returns variables that are set by current node:
/**
* Returns the variables that are set in this relational
* expression but also used and therefore not available to parents of this
* relational expression.
*
* <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
* variables.
*
* @return Names of variables which are set in this relational
* expression
*/
Set<CorrelationId> getVariablesSet();
But I've got a plan where node returns all variables used by children nodes
regardless this variable are set by current or parent node.
Original query is:
SELECT *
FROM t1 as "outer"
WHERE a > (
SELECT COUNT(*)
FROM t1 as "inner"
WHERE "inner".a IN (
SELECT *
FROM table(system_range("inner".a, "inner".b + "outer".b))
)
)
After SQL to Rel translation I've got plan as follow:
LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
LogicalFilter(condition=[>($2, $SCALAR_QUERY({
LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
LogicalFilter(condition=[IN($2, {
LogicalProject(X=[$0])
LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
})], variablesSet=[[$cor0]])
LogicalTableScan(table=[[PUBLIC, T1]])
}))], variablesSet=[[$cor2]])
LogicalTableScan(table=[[PUBLIC, T1]])
Every LogicalFilter introduce its own correlation variable, and everything is
OK so far.
But then I apply SubQueryRemoveRule and new plan looks like this:
LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
LogicalFilter(condition=[>($2, $7)])
LogicalCorrelate(correlation=[$cor2], joinType=[left], requiredColumns=[{3}])
LogicalTableScan(table=[[PUBLIC, T1]])
LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
LogicalJoin(condition=[=($2, $7)], joinType=[inner])
LogicalTableScan(table=[[PUBLIC, T1]])
LogicalAggregate(group=[{0}])
LogicalProject(X=[$0])
LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
At this point LogicalJoin.getVariablesSet() returns both "cor0" and "cor2"
variables which doesn't seem right.
Is such behaviour expected or it is a bug?
--
Regards,
Konstantin Orlov
Re: Inconsistency of javadoc and actual behaviour of
RelNode#getVariablesSet
Posted by Konstantin Orlov <ko...@gridgain.com>.
Hi, Julian
Thanks for your reply!
> I believe that variables can only be set in the current RelNode. (Read a row from input, set the variable, then evaluate a Rex expression or restart the right input. It’s like a ‘for’ loop.)
Actually, this is in line with my expectations. So, I'll file a ticket for this.
--
Regards,
Konstantin Orlov
> On 17 Sep 2021, at 21:44, Julian Hyde <jh...@gmail.com> wrote:
>
> The sentence "Note: only {@link org.apache.calcite.rel.core.Correlate} should set variables.” is no longer true, now we added correlated Filter and, I believe, correlated Project. Maybe we should also add correlated Join (in case the ON clause uses correlated variables).
>
> I believe that variables can only be set in the current RelNode. (Read a row from input, set the variable, then evaluate a Rex expression or restart the right input. It’s like a ‘for’ loop.) In which case, what you are seeing is wrong. But I’m not 100% sure.
>
> Note that, unlike *set*, variables can be *used* anywhere within a tree (generally in the right-hand input of a Correlate).
>
> Maybe you could propose better javadoc. That is worth doing independent of any bugs that you are trying to fix.
>
> Julian
>
>
>> On Sep 17, 2021, at 5:43 AM, Konstantin Orlov <ko...@gridgain.com> wrote:
>>
>> Hi, folks
>>
>> I have a question about org.apache.calcite.rel.RelNode#getVariablesSet.
>> Javadoc says, it returns variables that are set by current node:
>>
>> /**
>> * Returns the variables that are set in this relational
>> * expression but also used and therefore not available to parents of this
>> * relational expression.
>> *
>> * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
>> * variables.
>> *
>> * @return Names of variables which are set in this relational
>> * expression
>> */
>> Set<CorrelationId> getVariablesSet();
>>
>>
>> But I've got a plan where node returns all variables used by children nodes
>> regardless this variable are set by current or parent node.
>>
>> Original query is:
>>
>> SELECT *
>> FROM t1 as "outer"
>> WHERE a > (
>> SELECT COUNT(*)
>> FROM t1 as "inner"
>> WHERE "inner".a IN (
>> SELECT *
>> FROM table(system_range("inner".a, "inner".b + "outer".b))
>> )
>> )
>>
>> After SQL to Rel translation I've got plan as follow:
>>
>> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>> LogicalFilter(condition=[>($2, $SCALAR_QUERY({
>> LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
>> LogicalFilter(condition=[IN($2, {
>> LogicalProject(X=[$0])
>> LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
>> })], variablesSet=[[$cor0]])
>> LogicalTableScan(table=[[PUBLIC, T1]])
>> }))], variablesSet=[[$cor2]])
>> LogicalTableScan(table=[[PUBLIC, T1]])
>>
>> Every LogicalFilter introduce its own correlation variable, and everything is
>> OK so far.
>>
>> But then I apply SubQueryRemoveRule and new plan looks like this:
>>
>> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>> LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>> LogicalFilter(condition=[>($2, $7)])
>> LogicalCorrelate(correlation=[$cor2], joinType=[left], requiredColumns=[{3}])
>> LogicalTableScan(table=[[PUBLIC, T1]])
>> LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
>> LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
>> LogicalJoin(condition=[=($2, $7)], joinType=[inner])
>> LogicalTableScan(table=[[PUBLIC, T1]])
>> LogicalAggregate(group=[{0}])
>> LogicalProject(X=[$0])
>> LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
>>
>>
>> At this point LogicalJoin.getVariablesSet() returns both "cor0" and "cor2"
>> variables which doesn't seem right.
>>
>> Is such behaviour expected or it is a bug?
>>
>> --
>> Regards,
>> Konstantin Orlov
>>
>>
>
Re: Inconsistency of javadoc and actual behaviour of
RelNode#getVariablesSet
Posted by Julian Hyde <jh...@gmail.com>.
The sentence "Note: only {@link org.apache.calcite.rel.core.Correlate} should set variables.” is no longer true, now we added correlated Filter and, I believe, correlated Project. Maybe we should also add correlated Join (in case the ON clause uses correlated variables).
I believe that variables can only be set in the current RelNode. (Read a row from input, set the variable, then evaluate a Rex expression or restart the right input. It’s like a ‘for’ loop.) In which case, what you are seeing is wrong. But I’m not 100% sure.
Note that, unlike *set*, variables can be *used* anywhere within a tree (generally in the right-hand input of a Correlate).
Maybe you could propose better javadoc. That is worth doing independent of any bugs that you are trying to fix.
Julian
> On Sep 17, 2021, at 5:43 AM, Konstantin Orlov <ko...@gridgain.com> wrote:
>
> Hi, folks
>
> I have a question about org.apache.calcite.rel.RelNode#getVariablesSet.
> Javadoc says, it returns variables that are set by current node:
>
> /**
> * Returns the variables that are set in this relational
> * expression but also used and therefore not available to parents of this
> * relational expression.
> *
> * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set
> * variables.
> *
> * @return Names of variables which are set in this relational
> * expression
> */
> Set<CorrelationId> getVariablesSet();
>
>
> But I've got a plan where node returns all variables used by children nodes
> regardless this variable are set by current or parent node.
>
> Original query is:
>
> SELECT *
> FROM t1 as "outer"
> WHERE a > (
> SELECT COUNT(*)
> FROM t1 as "inner"
> WHERE "inner".a IN (
> SELECT *
> FROM table(system_range("inner".a, "inner".b + "outer".b))
> )
> )
>
> After SQL to Rel translation I've got plan as follow:
>
> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
> LogicalFilter(condition=[>($2, $SCALAR_QUERY({
> LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
> LogicalFilter(condition=[IN($2, {
> LogicalProject(X=[$0])
> LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
> })], variablesSet=[[$cor0]])
> LogicalTableScan(table=[[PUBLIC, T1]])
> }))], variablesSet=[[$cor2]])
> LogicalTableScan(table=[[PUBLIC, T1]])
>
> Every LogicalFilter introduce its own correlation variable, and everything is
> OK so far.
>
> But then I apply SubQueryRemoveRule and new plan looks like this:
>
> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
> LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
> LogicalFilter(condition=[>($2, $7)])
> LogicalCorrelate(correlation=[$cor2], joinType=[left], requiredColumns=[{3}])
> LogicalTableScan(table=[[PUBLIC, T1]])
> LogicalAggregate(group=[{}], COUNT(*)=[COUNT()])
> LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6])
> LogicalJoin(condition=[=($2, $7)], joinType=[inner])
> LogicalTableScan(table=[[PUBLIC, T1]])
> LogicalAggregate(group=[{0}])
> LogicalProject(X=[$0])
> LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)])
>
>
> At this point LogicalJoin.getVariablesSet() returns both "cor0" and "cor2"
> variables which doesn't seem right.
>
> Is such behaviour expected or it is a bug?
>
> --
> Regards,
> Konstantin Orlov
>
>