You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Ufuk Celebi (JIRA)" <ji...@apache.org> on 2015/05/26 09:45:17 UTC
[jira] [Created] (FLINK-2091) Lock contention during release of
network buffer pools
Ufuk Celebi created FLINK-2091:
----------------------------------
Summary: Lock contention during release of network buffer pools
Key: FLINK-2091
URL: https://issues.apache.org/jira/browse/FLINK-2091
Project: Flink
Issue Type: Improvement
Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
[~rmetzger] reported the following stack traces during cancelling of high parallelism jobs:
{code}
13:43:46,803 WARN org.apache.flink.runtime.taskmanager.Task - Task 'DataSource (at main(Job.java:59) (org.apache.flink.api.java.io.TextInputFormat)) (4/16)' did not react to cancelling signal, but is stuck in method:
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.setNumBuffers(LocalBufferPool.java:238)
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.redistributeBuffers(NetworkBufferPool.java:268)
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.destroyBufferPool(NetworkBufferPool.java:218)
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.lazyDestroy(LocalBufferPool.java:221)
org.apache.flink.runtime.io.network.partition.ResultPartition.destroyBufferPool(ResultPartition.java:302)
org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:366)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:647)
java.lang.Thread.run(Thread.java:745)
{code}
{code}
13:42:57,595 WARN org.apache.flink.runtime.taskmanager.Task - Task 'DataSource (at main(Job.java:59) (org.apache.flink.api.java.io.TextInputFormat)) (16/16)' did not react to cancelling signal, but is stuck in method:
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.destroyBufferPool(NetworkBufferPool.java:212)
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.lazyDestroy(LocalBufferPool.java:221)
org.apache.flink.runtime.io.network.partition.ResultPartition.destroyBufferPool(ResultPartition.java:302)
org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:366)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:647)
java.lang.Thread.run(Thread.java:745)
{code}
The issue is that during cancelling of high parallelism jobs the locks for buffer pool management are highly contended.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)