You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Romain Revol (JIRA)" <ji...@apache.org> on 2018/02/20 15:16:00 UTC
[jira] [Updated] (FLINK-8717) Flink seems to deadlock due to buffer
starvation when iterating
[ https://issues.apache.org/jira/browse/FLINK-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Romain Revol updated FLINK-8717:
--------------------------------
Description:
We are encountering what looks like a deadlock of Flink in one of our jobs with an "iterate" in it.
I've reduced the job use case to the example in this gist : [https://gist.github.com/rrevol/06ddfecd5f5ac7cbc67785b5d3a84dd4]
Nothe that :
* varying the parallelism affects the rapidity of occurence of the deadlock, but it always occur
* varying MAX_LOOP_NB does affect the deadlock : the higher it is, the faster we encounter the deadlock. If MAX_LOOP_NB == 1, no deadlock. It consequently leads to think that it happens when the number of iterations reaches some threshold.
From the [^threadDump.txt], it looks like some starvation over buffer allocation, maybe backpressure has flaws on iterate, but I may be mistaking since I don't know well Flink internals.
was:
We are encountering what looks like a deadlock of Flink in one of our jobs with an "iterate" in it.
I've reduced the job use case to the example in this gist : [https://gist.github.com/rrevol/06ddfecd5f5ac7cbc67785b5d3a84dd4]
Nothe that :
* varying the parallelism affects the rapidity of occurence of the deadlock, but it always occur
* varying MAX_LOOP_NB does affect the deadlock : the higher it is, the faster we encounter the deadlock. If MAX_LOOP_NB == 1, no deadlock. It consequently leads to think that it happens when the number of iterations reaches some threshold.
From the [^threadDump.txt], it looks like some starvation over buffer allocation, maybe backpressure has flaws on iterate, but I may be mistaking since I don't know we'll Flink internals.
> Flink seems to deadlock due to buffer starvation when iterating
> ---------------------------------------------------------------
>
> Key: FLINK-8717
> URL: https://issues.apache.org/jira/browse/FLINK-8717
> Project: Flink
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.4.0
> Environment: Windows 10 Pro 64-bit
> Core i7-6820HQ @ 2.7 GHz
> 16GB RAM
> Flink 1.4
> Scala client
> Scala 2.11.7
>
> Reporter: Romain Revol
> Priority: Major
> Attachments: threadDump.txt
>
>
> We are encountering what looks like a deadlock of Flink in one of our jobs with an "iterate" in it.
> I've reduced the job use case to the example in this gist : [https://gist.github.com/rrevol/06ddfecd5f5ac7cbc67785b5d3a84dd4]
> Nothe that :
> * varying the parallelism affects the rapidity of occurence of the deadlock, but it always occur
> * varying MAX_LOOP_NB does affect the deadlock : the higher it is, the faster we encounter the deadlock. If MAX_LOOP_NB == 1, no deadlock. It consequently leads to think that it happens when the number of iterations reaches some threshold.
> From the [^threadDump.txt], it looks like some starvation over buffer allocation, maybe backpressure has flaws on iterate, but I may be mistaking since I don't know well Flink internals.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)