You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ken Krugler <kk...@transpac.com> on 2018/01/25 04:36:28 UTC

Avoiding deadlock with iterations

Hi all,

We’ve run into deadlocks with two different streaming workflows that have iterations.

In both cases, the issue is with fan-out; if any operation in the loop can emit more records than consumed, eventually a network buffer fills up, and then everyone in the iteration loop is blocked.

One pattern we can use, when the operator that’s causing the fan-out has the ability to decide how much to emit, is to have it behave as an async function, emitting from a queue with multiple threads. If threads start blocking because of back pressure, then the queue begins to fill up, and the function can throttle back how much data it queues up. So this gives us a small (carefully managed) data reservoir we can use to avoid the deadlock.

Is there a better approach? I didn’t see any way to determine how “full” the various network buffers are, and use that for throttling. Plus there’s the issue of partitioning, where it would be impossible in many cases to know the impact of a record being emitted. So even if we could monitor buffers, I don’t think it’s a viable solution.

Thanks,

— Ken

--------------------------------------------
http://about.me/kkrugler <http://about.me/kkrugler>
+1 530-210-6378


Re: Avoiding deadlock with iterations

Posted by Piotr Nowojski <pi...@data-artisans.com>.
Hi,

This is a known problem and I don’t think there is an easy solution to this. Please refer to the:
http://mail-archives.apache.org/mod_mbox/flink-user/201704.mbox/%3C5486a7fd-41c3-4131-5100-272825088c34@gaborhermann.com%3E <http://mail-archives.apache.org/mod_mbox/flink-user/201704.mbox/%3C5486a7fd-41c3-4131-5100-272825088c34@gaborhermann.com%3E>
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66853132>

Thanks,
Piotrek

> On 25 Jan 2018, at 05:36, Ken Krugler <kk...@transpac.com> wrote:
> 
> Hi all,
> 
> We’ve run into deadlocks with two different streaming workflows that have iterations.
> 
> In both cases, the issue is with fan-out; if any operation in the loop can emit more records than consumed, eventually a network buffer fills up, and then everyone in the iteration loop is blocked.
> 
> One pattern we can use, when the operator that’s causing the fan-out has the ability to decide how much to emit, is to have it behave as an async function, emitting from a queue with multiple threads. If threads start blocking because of back pressure, then the queue begins to fill up, and the function can throttle back how much data it queues up. So this gives us a small (carefully managed) data reservoir we can use to avoid the deadlock.
> 
> Is there a better approach? I didn’t see any way to determine how “full” the various network buffers are, and use that for throttling. Plus there’s the issue of partitioning, where it would be impossible in many cases to know the impact of a record being emitted. So even if we could monitor buffers, I don’t think it’s a viable solution.
> 
> Thanks,
> 
> — Ken
> 
> --------------------------------------------
> http://about.me/kkrugler <http://about.me/kkrugler>
> +1 530-210-6378
>