You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2019/07/18 10:20:17 UTC

AsyncDataStream on key of KeyedStream

Hi to all,
I'm trying to exploit async IO in my Flink job.
In my use case I use keyed tumbling windows and I'd like to execute the
async action only once per key and window (while
the AsyncDataStream.unorderedWait execute the async call for every element
of my stream) ..is there an easy way to do that apart from using a process
function (that basically will lose the asynchronicity)?

Best,
Flavio

Re: AsyncDataStream on key of KeyedStream

Posted by Fabian Hueske <fh...@gmail.com>.
Sure:

                                                   /--> AsyncIO --\
STREAM --> ProcessFunc  --                          -- Union -- WindowFunc
                                                  \------------------/

ProcessFunc keeps track of the unique keys per window duration and emits
each distinct key just once to the AsyncIO function via a side output.
Through the main output it sends all values it receives.
AsyncIO queries the external store for each key it receives.
Union just unions both streams (possibly using an Either type).
WindowFunction compute the window and includes the information that was
fetched by the AsyncIO function.

Cheers,
Fabian

Am Di., 23. Juli 2019 um 17:25 Uhr schrieb Flavio Pompermaier <
pompermaier@okkam.it>:

> For each key I need to call an external REST service to get the current
> status and this is why I'd like to use Async IO. At the moment I do this in
> a process function but I'd like a cleaner solution (if possible).
> Do you think your proposal of forking could be a better option?
> Could you provide a simple snippet/peudo-code of it? I'm not sure I've
> fully undestand your suggestion..
>

Re: AsyncDataStream on key of KeyedStream

Posted by Flavio Pompermaier <po...@okkam.it>.
For each key I need to call an external REST service to get the current
status and this is why I'd like to use Async IO. At the moment I do this in
a process function but I'd like a cleaner solution (if possible).
Do you think your proposal of forking could be a better option?
Could you provide a simple snippet/peudo-code of it? I'm not sure I've
fully undestand your suggestion..

Re: AsyncDataStream on key of KeyedStream

Posted by Fabian Hueske <fh...@gmail.com>.
OK, I see. What information will be send out via the async request?
Maybe you can fork of a separate stream with the info that needs to be send
to the external service and later union the result with the main stream
before the window operator?



Am Di., 23. Juli 2019 um 14:12 Uhr schrieb Flavio Pompermaier <
pompermaier@okkam.it>:

> The problem of bundling all records together within a window is that this
> solution doesn't scale (in the case of large time windows and number of
> events)..my requirement could be fulfilled by a keyed ProcessFunction but I
> think AsyncDataStream should provide a first-class support to keyed streams
> (and thus perform a single call per key and window..). What do you think?
>
> On Tue, Jul 23, 2019 at 12:56 PM Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hi Flavio,
>>
>> Not sure I understood the requirements correctly.
>> Couldn't you just collect and bundle all records with a regular window
>> operator and forward one record for each key-window to an AsyncIO operator?
>>
>> Best, Fabian
>>
>> Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <
>> pompermaier@okkam.it>:
>>
>>> Hi to all,
>>> I'm trying to exploit async IO in my Flink job.
>>> In my use case I use keyed tumbling windows and I'd like to execute the
>>> async action only once per key and window (while
>>> the AsyncDataStream.unorderedWait execute the async call for every element
>>> of my stream) ..is there an easy way to do that apart from using a process
>>> function (that basically will lose the asynchronicity)?
>>>
>>> Best,
>>> Flavio
>>>
>>
>
>

Re: AsyncDataStream on key of KeyedStream

Posted by Flavio Pompermaier <po...@okkam.it>.
The problem of bundling all records together within a window is that this
solution doesn't scale (in the case of large time windows and number of
events)..my requirement could be fulfilled by a keyed ProcessFunction but I
think AsyncDataStream should provide a first-class support to keyed streams
(and thus perform a single call per key and window..). What do you think?

On Tue, Jul 23, 2019 at 12:56 PM Fabian Hueske <fh...@gmail.com> wrote:

> Hi Flavio,
>
> Not sure I understood the requirements correctly.
> Couldn't you just collect and bundle all records with a regular window
> operator and forward one record for each key-window to an AsyncIO operator?
>
> Best, Fabian
>
> Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <
> pompermaier@okkam.it>:
>
>> Hi to all,
>> I'm trying to exploit async IO in my Flink job.
>> In my use case I use keyed tumbling windows and I'd like to execute the
>> async action only once per key and window (while
>> the AsyncDataStream.unorderedWait execute the async call for every element
>> of my stream) ..is there an easy way to do that apart from using a process
>> function (that basically will lose the asynchronicity)?
>>
>> Best,
>> Flavio
>>
>

Re: AsyncDataStream on key of KeyedStream

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Flavio,

Not sure I understood the requirements correctly.
Couldn't you just collect and bundle all records with a regular window
operator and forward one record for each key-window to an AsyncIO operator?

Best, Fabian

Am Do., 18. Juli 2019 um 12:20 Uhr schrieb Flavio Pompermaier <
pompermaier@okkam.it>:

> Hi to all,
> I'm trying to exploit async IO in my Flink job.
> In my use case I use keyed tumbling windows and I'd like to execute the
> async action only once per key and window (while
> the AsyncDataStream.unorderedWait execute the async call for every element
> of my stream) ..is there an easy way to do that apart from using a process
> function (that basically will lose the asynchronicity)?
>
> Best,
> Flavio
>