You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Fanbin Bu <fa...@coinbase.com> on 2020/06/23 04:14:16 UTC

two phase aggregation

Hi,

Does over window aggregation support two-phase mode?
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/config.html#table-optimizer-agg-phase-strategy

SELECT
  user_id
  , event_time
  , listagg(event_type, '*') over w as names
FROM table
WINDOW w AS
( PARTITION BY user_id
  ORDER BY event_time
  ROWS BETWEEN 256 PRECEDING AND CURRENT ROW
)

Re: two phase aggregation

Posted by Jark Wu <im...@gmail.com>.
AFAIK, this is not on the roadmap.

The problem is that it doesn't get much improvement for over window
aggregates.
If we support two-phase for over window aggregate, the local over operator
doesn't reduce any data,
it has to emit the same number of records it received, and can't reduce
pressure of the global operator.

Best,
Jark

On Tue, 23 Jun 2020 at 13:09, Fanbin Bu <fa...@coinbase.com> wrote:

> Jark,
> thanks for the reply. Do you know whether it's on the roadmap or what's
> the plan?
>
> On Mon, Jun 22, 2020 at 9:36 PM Jark Wu <im...@gmail.com> wrote:
>
>> Hi Fanbin,
>>
>> Currently, over window aggregation doesn't support two-phase optimization.
>>
>> Best,
>> Jark
>>
>> On Tue, 23 Jun 2020 at 12:14, Fanbin Bu <fa...@coinbase.com> wrote:
>>
>>> Hi,
>>>
>>> Does over window aggregation support two-phase mode?
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/config.html#table-optimizer-agg-phase-strategy
>>>
>>> SELECT
>>>   user_id
>>>   , event_time
>>>   , listagg(event_type, '*') over w as names
>>> FROM table
>>> WINDOW w AS
>>> ( PARTITION BY user_id
>>>   ORDER BY event_time
>>>   ROWS BETWEEN 256 PRECEDING AND CURRENT ROW
>>> )
>>>
>>>

Re: two phase aggregation

Posted by Fanbin Bu <fa...@coinbase.com>.
Jark,
thanks for the reply. Do you know whether it's on the roadmap or what's the
plan?

On Mon, Jun 22, 2020 at 9:36 PM Jark Wu <im...@gmail.com> wrote:

> Hi Fanbin,
>
> Currently, over window aggregation doesn't support two-phase optimization.
>
> Best,
> Jark
>
> On Tue, 23 Jun 2020 at 12:14, Fanbin Bu <fa...@coinbase.com> wrote:
>
>> Hi,
>>
>> Does over window aggregation support two-phase mode?
>>
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/config.html#table-optimizer-agg-phase-strategy
>>
>> SELECT
>>   user_id
>>   , event_time
>>   , listagg(event_type, '*') over w as names
>> FROM table
>> WINDOW w AS
>> ( PARTITION BY user_id
>>   ORDER BY event_time
>>   ROWS BETWEEN 256 PRECEDING AND CURRENT ROW
>> )
>>
>>

Re: two phase aggregation

Posted by Jark Wu <im...@gmail.com>.
Hi Fanbin,

Currently, over window aggregation doesn't support two-phase optimization.

Best,
Jark

On Tue, 23 Jun 2020 at 12:14, Fanbin Bu <fa...@coinbase.com> wrote:

> Hi,
>
> Does over window aggregation support two-phase mode?
>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/config.html#table-optimizer-agg-phase-strategy
>
> SELECT
>   user_id
>   , event_time
>   , listagg(event_type, '*') over w as names
> FROM table
> WINDOW w AS
> ( PARTITION BY user_id
>   ORDER BY event_time
>   ROWS BETWEEN 256 PRECEDING AND CURRENT ROW
> )
>
>