You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Ufuk Celebi (JIRA)" <ji...@apache.org> on 2016/12/02 11:07:59 UTC

[jira] [Created] (FLINK-5228) LocalInputChannel re-trigger request and release deadlock

Ufuk Celebi created FLINK-5228:
----------------------------------

             Summary: LocalInputChannel re-trigger request and release deadlock
                 Key: FLINK-5228
                 URL: https://issues.apache.org/jira/browse/FLINK-5228
             Project: Flink
          Issue Type: Bug
          Components: Network
            Reporter: Ufuk Celebi
            Assignee: Ufuk Celebi
            Priority: Critical
             Fix For: 1.2.0, 1.1.4


Concurrent release and re-triggering of a partition request can lead to a deadlock.

{code}
Found one Java-level deadlock:
=============================
"Canceler for Map -> Sink: Unnamed (1/4)":
waiting to lock monitor 0x0000000001e27bd8 (object 0x00000000ffa1f688, a java.lang.Object),
which is held by "Timer-3"
"Timer-3":
waiting to lock monitor 0x00007fdbd029ec48 (object 0x00000000ffa1f3a0, a java.lang.Object),
which is held by "Canceler for Map -> Sink: Unnamed (1/4)"

Java stack information for the threads listed above:
===================================================
"Canceler for Map -> Sink: Unnamed (1/4)":
   at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.releaseAllResources(LocalInputChannel.java:240)
   - waiting to lock <0x00000000ffa1f688> (a java.lang.Object)
   at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.releaseAllResources(SingleInputGate.java:348)
   - locked <0x00000000ffa1f3a0> (a java.lang.Object)
   at org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1280)
   at java.lang.Thread.run(Thread.java:745)
"Timer-3":
   at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.retriggerPartitionRequest(SingleInputGate.java:307)
   - waiting to lock <0x00000000ffa1f3a0> (a java.lang.Object)
   at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:128)
   - locked <0x00000000ffa1f688> (a java.lang.Object)
   at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel$1.run(LocalInputChannel.java:148)
   at java.util.TimerThread.mainLoop(Timer.java:555)
   at java.util.TimerThread.run(Timer.java:505)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)