You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Ian Bertolacci (Jira)" <ji...@apache.org> on 2023/04/04 20:23:00 UTC
[jira] [Updated] (CALCITE-5631) Optimization to merge redundant joins
[ https://issues.apache.org/jira/browse/CALCITE-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Bertolacci updated CALCITE-5631:
------------------------------------
Description:
There are situations where multiple joins can be merged into one.
For example:
{code:sql}
select
(select numarrayagg(C5633_203) from T893 where C5633_586 = T895.id),
(select numarrayagg(C5633_170) from T893 where C5633_586 = T895.id)
from T895
{code}
having this RelNode tree (after decorrelation):
{code:java}
01:LogicalProject(EXPR$0=[$70], EXPR$1=[$72])
└─ 02:LogicalJoin(condition=[=($0, $71)], joinType=[left])
├─ 03:LogicalProject(...)
| └─ 04:LogicalJoin(condition=[=($0, $70)], joinType=[left])
| ├─ 05:QueryTableScan(table=[[QUERY, T895]])
| └─ 06:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
| └─ 07:LogicalProject(C5633_586=[$85], C5633_203=[$45])
| └─ 08:LogicalFilter(condition=[=($85, $85)])
| └─ 09:QueryTableScan(table=[[QUERY, T893]])
└─ 10:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
└─ 12:LogicalProject(C5633_586=[$85], C5633_170=[$12])
└─ 12:LogicalFilter(condition=[=($85, $85)])
└─ 13:QueryTableScan(table=[[QUERY, T893]])
{code}
can be simplified as this SQL:
{code:sql}
select
agg.agg1,
agg.agg2
from
T895 left join
(select
C5633_586,
numarrayagg(C5633_203) as agg1,
numarrayagg(C5633_170) as agg2
from T893 group by C5633_586
where C5633_586 is not null
) as agg
on agg.C5633_586 = T895.id
{code}
with this RelNode tree:
{code:java}
20:LogicalProject(agg1=[$71], agg2=[$72])
├─ 21:LogicalJoin(condition=[=($0, $70)], joinType=[left])
└─ 23:QueryTableScan(table=[[QUERY, T895]])
└─ 24:LogicalAggregate(group=[{0}], agg1=[ARRAY_AGG($1)], agg2=[ARRAY_AGG($2)])
└─ 25:LogicalProject(C5633_586=[$85], C5633_203=[$45], C5633_170=[$12])
└─ 26:LogicalFilter(condition=[IS_NOT_NULL($85)])
└─ 27:QueryTableScan(table=[[QUERY, T893]])
{code}
This can be done by identifying joins on the same underlying columns with "mergeable" strings on of nodes on the same sides of the target joins
(I wish I had a precise definition of "mergable", but I don't. but it is easy to see that the two RHSs in the first example can be combined (merged) into the RHS in the second example without changing the result.)
was:
There are situations where multiple joins can be merged into one.
For example:
{code:sql}
select
(select numarrayagg(C5633_203) from T893 where C5633_586 = T895.id),
(select numarrayagg(C5633_170) from T893 where C5633_586 = T895.id)
from T895
{code}
having this RelNode tree (after decorrelation):
{code:java}
01:LogicalProject(EXPR$0=[$70], EXPR$1=[$72])
└─ 02:LogicalJoin(condition=[=($0, $71)], joinType=[left])
├─ 03:LogicalProject(...)
| └─ 04:LogicalJoin(condition=[=($0, $70)], joinType=[left])
| ├─ 05:QueryTableScan(table=[[QUERY, T895]])
| └─ 06:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
| └─ 07:LogicalProject(C5633_586=[$85], C5633_203=[$45])
| └─ 08:LogicalFilter(condition=[=($85, $85)])
| └─ 09:QueryTableScan(table=[[QUERY, T893]])
└─ 10:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
└─ 12:LogicalProject(C5633_586=[$85], C5633_170=[$12])
└─ 12:LogicalFilter(condition=[=($85, $85)])
└─ 13:QueryTableScan(table=[[QUERY, T893]])
{code}
can be simplified as this SQL:
{code:sql}
select agg.agg1, agg.agg2 from
T895 left join
(select
C5633_586,
numarrayagg(C5633_203) as agg1,
numarrayagg(C5633_170) as agg2
from T893 group by C5633_586
where C5633_586 is not null
) as agg
on T895.id = agg.C5633_586
{code}
with this RelNode tree:
{code:java}
20:LogicalProject(agg1=[$71], agg2=[$72])
├─ 21:LogicalJoin(condition=[=($0, $70)], joinType=[left])
└─ 23:QueryTableScan(table=[[QUERY, T895]])
└─ 24:LogicalAggregate(group=[{0}], agg1=[ARRAY_AGG($1)], agg2=[ARRAY_AGG($2)])
└─ 25:LogicalProject(C5633_586=[$85], C5633_203=[$45], C5633_170=[$12])
└─ 26:LogicalFilter(condition=[IS_NOT_NULL($85)])
└─ 27:QueryTableScan(table=[[QUERY, T893]])
{code}
This can be done by identifying joins on the same underlying columns with "mergeable" strings on of nodes on the same sides of the target joins
(I wish I had a precise definition of "mergable", but I don't. but it is easy to see that the two RHSs in the first example can be combined (merged) into the RHS in the second example without changing the result.)
> Optimization to merge redundant joins
> -------------------------------------
>
> Key: CALCITE-5631
> URL: https://issues.apache.org/jira/browse/CALCITE-5631
> Project: Calcite
> Issue Type: Wish
> Reporter: Ian Bertolacci
> Priority: Major
>
> There are situations where multiple joins can be merged into one.
> For example:
> {code:sql}
> select
> (select numarrayagg(C5633_203) from T893 where C5633_586 = T895.id),
> (select numarrayagg(C5633_170) from T893 where C5633_586 = T895.id)
> from T895
> {code}
> having this RelNode tree (after decorrelation):
> {code:java}
> 01:LogicalProject(EXPR$0=[$70], EXPR$1=[$72])
> └─ 02:LogicalJoin(condition=[=($0, $71)], joinType=[left])
> ├─ 03:LogicalProject(...)
> | └─ 04:LogicalJoin(condition=[=($0, $70)], joinType=[left])
> | ├─ 05:QueryTableScan(table=[[QUERY, T895]])
> | └─ 06:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
> | └─ 07:LogicalProject(C5633_586=[$85], C5633_203=[$45])
> | └─ 08:LogicalFilter(condition=[=($85, $85)])
> | └─ 09:QueryTableScan(table=[[QUERY, T893]])
> └─ 10:LogicalAggregate(group=[{0}], EXPR$0=[ARRAY_AGG($1)])
> └─ 12:LogicalProject(C5633_586=[$85], C5633_170=[$12])
> └─ 12:LogicalFilter(condition=[=($85, $85)])
> └─ 13:QueryTableScan(table=[[QUERY, T893]])
> {code}
> can be simplified as this SQL:
> {code:sql}
> select
> agg.agg1,
> agg.agg2
> from
> T895 left join
> (select
> C5633_586,
> numarrayagg(C5633_203) as agg1,
> numarrayagg(C5633_170) as agg2
> from T893 group by C5633_586
> where C5633_586 is not null
> ) as agg
> on agg.C5633_586 = T895.id
> {code}
> with this RelNode tree:
> {code:java}
> 20:LogicalProject(agg1=[$71], agg2=[$72])
> ├─ 21:LogicalJoin(condition=[=($0, $70)], joinType=[left])
> └─ 23:QueryTableScan(table=[[QUERY, T895]])
> └─ 24:LogicalAggregate(group=[{0}], agg1=[ARRAY_AGG($1)], agg2=[ARRAY_AGG($2)])
> └─ 25:LogicalProject(C5633_586=[$85], C5633_203=[$45], C5633_170=[$12])
> └─ 26:LogicalFilter(condition=[IS_NOT_NULL($85)])
> └─ 27:QueryTableScan(table=[[QUERY, T893]])
> {code}
> This can be done by identifying joins on the same underlying columns with "mergeable" strings on of nodes on the same sides of the target joins
> (I wish I had a precise definition of "mergable", but I don't. but it is easy to see that the two RHSs in the first example can be combined (merged) into the RHS in the second example without changing the result.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)