You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by James Daniel <dj...@gmail.com> on 2021/05/18 16:35:35 UTC

Tracking column's origin

Hi, all.
I am trying to rewrite the query plan by removing some nodes but faced with
issues related to manipulating column ref indexes.

Let's consider the following calcite plan:

  LogicalProject([...])
    LogicalJoin(condition=[=($0, $4)], joinType=[inner])
      LogicalTableScan(table=[[DB, R]])
      LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
        LogicalProject(a=[$3], $f0=[true])
          LogicalTableScan(table=[[DB, S]])


In the join condition =($0, $4), the RHS column ref $4 is actually coming
from the $3 of table S.
To know this, we track down few nodes in the tree starting from the RHS
child of the join node.
But it becomes tricky when we use a more complex situation.
1) So, I wonder is there a utility class or method to support this purpose?

Furthermore, when we remove LogicalProject(a=[$3], $f0=[true]), we have to
manipulate all related column ref index starting from a parent of that
project node to the root node but manually tracking and shifting column ref
index is really a matter due to its complexity.
2) So I wonder the current Calcite impl has a utility class or methods to
help this situation.

3) Also, would you give me some general guidelines for implementing this
kind of stuff in Calcite?

Thanks,
James

Re: Tracking column's origin

Posted by JiaTao Tao <ta...@gmail.com>.

Hi

org.apache.calcite.rel.metadata.RelMetadataQuery#getColumnOrigins may help

Regards!

Aron Tao


James Daniel <dj...@gmail.com> 于2021年5月19日周三 上午12:35写道：

> Hi, all.
> I am trying to rewrite the query plan by removing some nodes but faced with
> issues related to manipulating column ref indexes.
>
> Let's consider the following calcite plan:
>
>   LogicalProject([...])
>     LogicalJoin(condition=[=($0, $4)], joinType=[inner])
>       LogicalTableScan(table=[[DB, R]])
>       LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>         LogicalProject(a=[$3], $f0=[true])
>           LogicalTableScan(table=[[DB, S]])
>
>
> In the join condition =($0, $4), the RHS column ref $4 is actually coming
> from the $3 of table S.
> To know this, we track down few nodes in the tree starting from the RHS
> child of the join node.
> But it becomes tricky when we use a more complex situation.
> 1) So, I wonder is there a utility class or method to support this purpose?
>
> Furthermore, when we remove LogicalProject(a=[$3], $f0=[true]), we have to
> manipulate all related column ref index starting from a parent of that
> project node to the root node but manually tracking and shifting column ref
> index is really a matter due to its complexity.
> 2) So I wonder the current Calcite impl has a utility class or methods to
> help this situation.
>
> 3) Also, would you give me some general guidelines for implementing this
> kind of stuff in Calcite?
>
> Thanks,
> James
>

回复：Tracking column's origin

Posted by 953396112 <13...@qq.com>.

Hi James:
1) I guess you want to trace the column's origin in original table. In Calcite, we can use `RelMetadatauery.getColumnOrigin()` to trace the column's origin.The method tracks the origin of columns.Here is a unit test 'org.apache.calcite.test.RelMetadataTest#testCalcColumnOriginsTable' for your reference.
2) After removing a specific operator, the column reference of the parent operator will be affected. It seems that no tool class can do this. Generally speaking, I will traverse to a specific operator pattern to modify the related column reference and generate a new RelNode. Maybe we use `RelOptRule` or `RelShuttle` to do this.
&nbsp; I hope it can help you.
&nbsp; Xu




------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <djames17691@gmail.com&gt;;
发送时间:&nbsp;2021年5月19日(星期三) 凌晨0:35
收件人:&nbsp;"dev"<dev@calcite.apache.org&gt;;

主题:&nbsp;Tracking column's origin



Hi, all.
I am trying to rewrite the query plan by removing some nodes but faced with
issues related to manipulating column ref indexes.

Let's consider the following calcite plan:

&nbsp; LogicalProject([...])
&nbsp;&nbsp;&nbsp; LogicalJoin(condition=[=($0, $4)], joinType=[inner])
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalTableScan(table=[[DB, R]])
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalProject(a=[$3], $f0=[true])
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalTableScan(table=[[DB, S]])


In the join condition =($0, $4), the RHS column ref $4 is actually coming
from the $3 of table S.
To know this, we track down few nodes in the tree starting from the RHS
child of the join node.
But it becomes tricky when we use a more complex situation.
1) So, I wonder is there a utility class or method to support this purpose?

Furthermore, when we remove LogicalProject(a=[$3], $f0=[true]), we have to
manipulate all related column ref index starting from a parent of that
project node to the root node but manually tracking and shifting column ref
index is really a matter due to its complexity.
2) So I wonder the current Calcite impl has a utility class or methods to
help this situation.

3) Also, would you give me some general guidelines for implementing this
kind of stuff in Calcite?

Thanks,
James