You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Danny Chan <yu...@gmail.com> on 2020/08/03 09:21:34 UTC

Re: Pushing down aggregates through rhs of a left join

I didn’t see the derivation formula yet but I believe there is indeed some promotion space for the Agg Join transpose cases, Alex, can you log an issue there ?

Best,
Danny Chan
在 2020年7月31日 +0800 PM1:43,Alex Baden <al...@omnisci.com>,写道:
> Hi all,
>
> I have a query of the form:
>
> SELECT a.x, SUM(b.y), SUM(b.z) FROM t1 a LEFT JOIN t2 b ON a.join_key
> = b.join_key GROUP BY a.x ORDER BY a.x;
>
> If table b has a large number of duplicate keys for `join_key`, the
> left join can be very expensive to compute. Instead, we would like to
> run:
>
> WITH t2g AS (SELECT b.join_key, SUM(b.y), SUM(b.z) FROM t2 b GROUP BY
> b.join_key)
> SELECT a.x, SUM(bg.y), SUM(bg.z) FROM t1 a LEFT JOIN t2g bg ON
> a.join_key = t2g.join_key GROUP BY a.x ORDER BY a.x;
>
> Essentially, since we are only projecting aggregates from the rhs of
> the join, and the aggregate functions are associative, we can group by
> the join key to compute the aggregates up front, then join on the
> grouped results, and finally aggregate among join matches.
>
> Looking at the comments of the AGGREGATE_JOIN_TRANSPOSE_RULE, I noted
> the following:
> // OUTER joins are supported for group by without aggregate functions
>
> But based on the above, if we have a left join I believe we can
> transpose the aggregate and the join if the following conditions hold:
> 1) only expressions from the rhs of the join are aggregated
> 2) all aggregate functions from (1) are associative (can be split)
> 3) at least one expression from the lhs of the join is grouped
>
> I am interested in implementing this rule (assuming the conditions
> above are strong enough to guarantee correctness). Is there interest
> in PRing something like this to calcite, either as a new rule or part
> of AggregateJoinTranspose?
>
> Thanks,
> Alex