You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jaya Johnson <ja...@nytimes.com> on 2020/06/19 20:06:48 UTC

Re: Question about beam streaming pipelines.

On Fri, Jun 19, 2020 at 9:51 AM Jaya Johnson <ja...@nytimes.com>
wrote:

> I am trying to set up a pipeline that does the following.
> For a window of n seconds I want to pack messages received in that window
> (messages are strings)  to a single string with "\n" as delimiter and push.
> I am trying to use the following for this:
> apply fixed window to the input pcollections.
> to the windowed pcollections - apply Combine.globally and write a custom
> function to concat the interable strings to a single string.
>
> Does this seem like the right approach? Are there built in custom
> transformations I could use for this?
>
> Also if I want to do custom sizing of the packed messages to say a zipped
> format or limit by # of bytes can I add this to the window itself or would
> I need a custom transformation for this.
>
> Thank you!
>
>

Re: Question about beam streaming pipelines.

Posted by Luke Cwik <lc...@google.com>.
Based on what you describe that makes sense to me. Window merging could be
used with session like windows limiting merging based upon # bytes but this
will not be obvious to users reading your code afterwards so I wouldn't
recommend it.

If you don't have to be so strict in the number of bytes or require that
there is a maximal amount of packing of messages you could take a look into
GroupIntoBatches[1].

1:
https://beam.apache.org/documentation/transforms/java/aggregation/groupintobatches/



On Fri, Jun 19, 2020 at 1:07 PM Jaya Johnson <ja...@nytimes.com>
wrote:

>
>
> On Fri, Jun 19, 2020 at 9:51 AM Jaya Johnson <ja...@nytimes.com>
> wrote:
>
>> I am trying to set up a pipeline that does the following.
>> For a window of n seconds I want to pack messages received in that window
>> (messages are strings)  to a single string with "\n" as delimiter and push.
>> I am trying to use the following for this:
>> apply fixed window to the input pcollections.
>> to the windowed pcollections - apply Combine.globally and write a custom
>> function to concat the interable strings to a single string.
>>
>> Does this seem like the right approach? Are there built in custom
>> transformations I could use for this?
>>
>> Also if I want to do custom sizing of the packed messages to say a zipped
>> format or limit by # of bytes can I add this to the window itself or would
>> I need a custom transformation for this.
>>
>> Thank you!
>>
>>