You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Akshay Balwally <ab...@lyft.com> on 2018/10/01 16:52:38 UTC

CoGroupByKey only on window end.

Hi everyone,

I would like to use a CoGroupByKey statement on unevenly windowed streams
(one of size 15 minutes, one of size 1 minute). As I understand it,
CoGroupByKey groups first by key, then by window. But of course since the
windows are not the same, my CoGroupByKey does not successfully join the
streams.

One idea I had is to extend CoGroupByKey to make some
"CoGroupByKeyWindowEnd", that groups first by key, then by window.end. I
just wanted to check first- is there a better way to do this? Or something
natively supported by Beam?

Thanks,
Akshay
-- 
Akshay Balwally
Software Engineer
9372716469 |

<https://www.lyft.com/>

Re: CoGroupByKey only on window end.

Posted by Akshay Balwally <ab...@lyft.com>.
Thanks Ankur! Your solution is way better, I'll do that.

On Mon, Oct 1, 2018 at 2:18 PM Ankur Goenka <go...@google.com> wrote:

> CoGBK and GBK need consistent windowing in PCollection. In your case, a
> custom solution is needed.
> Here is another way which only need pipeline orchestration and might be
> simpler.
>
> Lets say you have pcollection A with 15 min window and pcollection B with
> 1 min window
> Step 1: GBK pcollection A for 15 min window.
> Step 2: Read GBK A and re-emit same value for 15 x 1 min windows. Lets
> call this pcollection A'
> Step 3: Now A' and B have same window. Do CoGBK on A' and B.
> ...
>
> Thanks,
> Ankur
>
> On Mon, Oct 1, 2018 at 9:52 AM Akshay Balwally <ab...@lyft.com> wrote:
>
>> Hi everyone,
>>
>> I would like to use a CoGroupByKey statement on unevenly windowed streams
>> (one of size 15 minutes, one of size 1 minute). As I understand it,
>> CoGroupByKey groups first by key, then by window. But of course since the
>> windows are not the same, my CoGroupByKey does not successfully join the
>> streams.
>>
>> One idea I had is to extend CoGroupByKey to make some
>> "CoGroupByKeyWindowEnd", that groups first by key, then by window.end. I
>> just wanted to check first- is there a better way to do this? Or something
>> natively supported by Beam?
>>
>> Thanks,
>> Akshay
>> --
>> Akshay Balwally
>> Software Engineer
>> 9372716469 <(937)%20271-6469> |
>>
>> <https://www.lyft.com/>
>>
> --
Akshay Balwally
Software Engineer
9372716469 |

<https://www.lyft.com/>

Re: CoGroupByKey only on window end.

Posted by Ankur Goenka <go...@google.com>.
CoGBK and GBK need consistent windowing in PCollection. In your case, a
custom solution is needed.
Here is another way which only need pipeline orchestration and might be
simpler.

Lets say you have pcollection A with 15 min window and pcollection B with 1
min window
Step 1: GBK pcollection A for 15 min window.
Step 2: Read GBK A and re-emit same value for 15 x 1 min windows. Lets call
this pcollection A'
Step 3: Now A' and B have same window. Do CoGBK on A' and B.
...

Thanks,
Ankur

On Mon, Oct 1, 2018 at 9:52 AM Akshay Balwally <ab...@lyft.com> wrote:

> Hi everyone,
>
> I would like to use a CoGroupByKey statement on unevenly windowed streams
> (one of size 15 minutes, one of size 1 minute). As I understand it,
> CoGroupByKey groups first by key, then by window. But of course since the
> windows are not the same, my CoGroupByKey does not successfully join the
> streams.
>
> One idea I had is to extend CoGroupByKey to make some
> "CoGroupByKeyWindowEnd", that groups first by key, then by window.end. I
> just wanted to check first- is there a better way to do this? Or something
> natively supported by Beam?
>
> Thanks,
> Akshay
> --
> Akshay Balwally
> Software Engineer
> 9372716469 |
>
> <https://www.lyft.com/>
>