You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Frank Xue <fr...@bettercloud.com> on 2017/05/22 19:36:22 UTC

Async IO Question

Hi,

I have a question related to async io for Flink. I found that when running
unordered (AsyncDataStream.unorderedWait) failures within each individual
asyncInvoke is added back to be retried, but when I run it ordered
(AsyncDataStream.orderedWait) and an exception is thrown within
asyncInvoke, it just stops the whole process. Is this expected behavior?

Thanks,
Frank

-- 
*Frank Xue* | Software Engineer | w.
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
<https://www.bettercloud.com?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=unified_saas_management/>
*The First Multi-SaaS Management Platform
<https://www.bettercloud.com/whybettercloud?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=video>*

Re: Async IO Question

Posted by Till Rohrmann <tr...@apache.org>.
Hi Frank,

yes it could be related to the bug we have fixed in 1.3. Could you try it
out with Flink 1.3 to see if it fixes your problem? If not, then I would
like to take a look at your code to exactly see what is happening there.

Cheers,
Till

On Tue, May 23, 2017 at 4:28 PM, Frank Xue <fr...@bettercloud.com>
wrote:

> Thanks for the reply Till! I am using Flink 1.2.1 so it could be an issue
> with the bug you mentioned that looks to be fixed in 1.3. The restart
> strategy is a fixed delay restart and I have tried various checkpoint and
> restart intervals and the behavior remains the same. Pretty much inside the
> asyncInvoke it does a query call on ElasticSearch and puts it in a future.
> On complete, if the future is a failure it throws the exception. I
> speculate it does not retry properly on only ordered because when one
> instance fails, other events are waiting on that one since the output has
> to remain in order which eventually clogs up the capacity so when the job
> for the failed event is restarted there is no room for it to be run again.
> I should mention that I am running load testing when this happens (tens or
> hundreds of thousands of events coming through at about the same time).
> Does this help shed more light on the behavior I am seeing?
>
> Thanks,
> Frank
>
> On Tue, May 23, 2017 at 10:07 AM, Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Frank,
>>
>> which version of Flink are you using? There was a problem with correctly
>> recognizing failed asynchronous operations, see FLINK-6435 [1].
>>
>> In general, if an exception occurs within AsyncFunction#asyncInvoke,
>> then the job should fail. Depending on which restart strategy you have
>> chosen, the job is retried or not. What Flink should not do is to retry for
>> the unordered case and not retry for the ordered case.
>>
>> Maybe you could share the exact code you’re running in order to see
>> whether you meant exception occurring within AsyncFunction#asyncInvoke
>> or within a future which completes AsyncCollector.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-6435
>>
>> Cheers,
>> Till
>> ​
>>
>> On Mon, May 22, 2017 at 9:36 PM, Frank Xue <fr...@bettercloud.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a question related to async io for Flink. I found that when
>>> running unordered (AsyncDataStream.unorderedWait) failures within each
>>> individual asyncInvoke is added back to be retried, but when I run it
>>> ordered (AsyncDataStream.orderedWait) and an exception is thrown within
>>> asyncInvoke, it just stops the whole process. Is this expected behavior?
>>>
>>> Thanks,
>>> Frank
>>>
>>> --
>>> *Frank Xue* | Software Engineer | w.
>>> 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
>>>
>>> <https://www.bettercloud.com?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=unified_saas_management/>
>>> *The First Multi-SaaS Management Platform
>>> <https://www.bettercloud.com/whybettercloud?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=video>*
>>>
>>
>>
>
>
> --
> *Frank Xue* | Software Engineer | w.
> 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
>
> <https://www.bettercloud.com?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=unified_saas_management/>
> *The First Multi-SaaS Management Platform
> <https://www.bettercloud.com/whybettercloud?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=video>*
>

Re: Async IO Question

Posted by Frank Xue <fr...@bettercloud.com>.
Thanks for the reply Till! I am using Flink 1.2.1 so it could be an issue
with the bug you mentioned that looks to be fixed in 1.3. The restart
strategy is a fixed delay restart and I have tried various checkpoint and
restart intervals and the behavior remains the same. Pretty much inside the
asyncInvoke it does a query call on ElasticSearch and puts it in a future.
On complete, if the future is a failure it throws the exception. I
speculate it does not retry properly on only ordered because when one
instance fails, other events are waiting on that one since the output has
to remain in order which eventually clogs up the capacity so when the job
for the failed event is restarted there is no room for it to be run again.
I should mention that I am running load testing when this happens (tens or
hundreds of thousands of events coming through at about the same time).
Does this help shed more light on the behavior I am seeing?

Thanks,
Frank

On Tue, May 23, 2017 at 10:07 AM, Till Rohrmann <tr...@apache.org>
wrote:

> Hi Frank,
>
> which version of Flink are you using? There was a problem with correctly
> recognizing failed asynchronous operations, see FLINK-6435 [1].
>
> In general, if an exception occurs within AsyncFunction#asyncInvoke, then
> the job should fail. Depending on which restart strategy you have chosen,
> the job is retried or not. What Flink should not do is to retry for the
> unordered case and not retry for the ordered case.
>
> Maybe you could share the exact code you’re running in order to see
> whether you meant exception occurring within AsyncFunction#asyncInvoke or
> within a future which completes AsyncCollector.
>
> [1] https://issues.apache.org/jira/browse/FLINK-6435
>
> Cheers,
> Till
> ​
>
> On Mon, May 22, 2017 at 9:36 PM, Frank Xue <fr...@bettercloud.com>
> wrote:
>
>> Hi,
>>
>> I have a question related to async io for Flink. I found that when
>> running unordered (AsyncDataStream.unorderedWait) failures within each
>> individual asyncInvoke is added back to be retried, but when I run it
>> ordered (AsyncDataStream.orderedWait) and an exception is thrown within
>> asyncInvoke, it just stops the whole process. Is this expected behavior?
>>
>> Thanks,
>> Frank
>>
>> --
>> *Frank Xue* | Software Engineer | w.
>> 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
>>
>> <https://www.bettercloud.com?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=unified_saas_management/>
>> *The First Multi-SaaS Management Platform
>> <https://www.bettercloud.com/whybettercloud?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=video>*
>>
>
>


-- 
*Frank Xue* | Software Engineer | w.
3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
<https://www.bettercloud.com?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=unified_saas_management/>
*The First Multi-SaaS Management Platform
<https://www.bettercloud.com/whybettercloud?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=video>*

Re: Async IO Question

Posted by Till Rohrmann <tr...@apache.org>.
Hi Frank,

which version of Flink are you using? There was a problem with correctly
recognizing failed asynchronous operations, see FLINK-6435 [1].

In general, if an exception occurs within AsyncFunction#asyncInvoke, then
the job should fail. Depending on which restart strategy you have chosen,
the job is retried or not. What Flink should not do is to retry for the
unordered case and not retry for the ordered case.

Maybe you could share the exact code you’re running in order to see whether
you meant exception occurring within AsyncFunction#asyncInvoke or within a
future which completes AsyncCollector.

[1] https://issues.apache.org/jira/browse/FLINK-6435

Cheers,
Till
​

On Mon, May 22, 2017 at 9:36 PM, Frank Xue <fr...@bettercloud.com>
wrote:

> Hi,
>
> I have a question related to async io for Flink. I found that when running
> unordered (AsyncDataStream.unorderedWait) failures within each individual
> asyncInvoke is added back to be retried, but when I run it ordered
> (AsyncDataStream.orderedWait) and an exception is thrown within
> asyncInvoke, it just stops the whole process. Is this expected behavior?
>
> Thanks,
> Frank
>
> --
> *Frank Xue* | Software Engineer | w.
> 3405 Piedmont Rd. NE, Suite 325, Atlanta, GA 30305
>
> <https://www.bettercloud.com?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=unified_saas_management/>
> *The First Multi-SaaS Management Platform
> <https://www.bettercloud.com/whybettercloud?utm_source=bettercloud_email&utm_medium=email_signature&utm_campaign=video>*
>