You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (Jira)" <ji...@apache.org> on 2020/05/28 16:30:00 UTC

[jira] [Closed] (FLINK-17558) Partitions are released in TaskExecutor Main Thread

     [ https://issues.apache.org/jira/browse/FLINK-17558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chesnay Schepler closed FLINK-17558.
------------------------------------
    Resolution: Fixed

> Partitions are released in TaskExecutor Main Thread
> ---------------------------------------------------
>
>                 Key: FLINK-17558
>                 URL: https://issues.apache.org/jira/browse/FLINK-17558
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0, 1.10.1, 1.11.0
>            Reporter: Gary Yao
>            Assignee: Chesnay Schepler
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.11.0, 1.10.2
>
>
> Partitions are released in the main thread of the TaskExecutor (see the stacktrace below). This can lead to missed heartbeats, timeouts of RPCs, etc. because deleting files is blocking I/O. The partitions should be released in a devoted I/O thread pool ({{TaskExecutor#ioExecutor}} is a candidate but requires a higher default thread count). 
> {noformat}
> 2020-05-06T19:13:12.4383402Z "flink-akka.actor.default-dispatcher-35" #3555 prio=5 os_prio=0 tid=0x00007f7fcc071000 nid=0x1f3f9 runnable [0x00007f7fd302c000]
> 2020-05-06T19:13:12.4383983Z    java.lang.Thread.State: RUNNABLE
> 2020-05-06T19:13:12.4384519Z    at sun.nio.fs.UnixNativeDispatcher.unlink0(Native Method)
> 2020-05-06T19:13:12.4384971Z    at sun.nio.fs.UnixNativeDispatcher.unlink(UnixNativeDispatcher.java:146)
> 2020-05-06T19:13:12.4385465Z    at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:231)
> 2020-05-06T19:13:12.4386000Z    at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
> 2020-05-06T19:13:12.4386458Z    at java.nio.file.Files.delete(Files.java:1126)
> 2020-05-06T19:13:12.4386968Z    at org.apache.flink.runtime.io.network.partition.FileChannelBoundedData.close(FileChannelBoundedData.java:93)
> 2020-05-06T19:13:12.4388088Z    at org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartition.checkReaderReferencesAndDispose(BoundedBlockingSubpartition.java:247)
> 2020-05-06T19:13:12.4388765Z    at org.apache.flink.runtime.io.network.partition.BoundedBlockingSubpartition.release(BoundedBlockingSubpartition.java:208)
> 2020-05-06T19:13:12.4389444Z    - locked <0x00000000ff836d78> (a java.lang.Object)
> 2020-05-06T19:13:12.4389905Z    at org.apache.flink.runtime.io.network.partition.ResultPartition.release(ResultPartition.java:290)
> 2020-05-06T19:13:12.4390481Z    at org.apache.flink.runtime.io.network.partition.ResultPartitionManager.releasePartition(ResultPartitionManager.java:80)
> 2020-05-06T19:13:12.4391118Z    - locked <0x000000009d452b90> (a java.util.HashMap)
> 2020-05-06T19:13:12.4391597Z    at org.apache.flink.runtime.io.network.NettyShuffleEnvironment.releasePartitionsLocally(NettyShuffleEnvironment.java:153)
> 2020-05-06T19:13:12.4392267Z    at org.apache.flink.runtime.io.network.partition.TaskExecutorPartitionTrackerImpl.stopTrackingAndReleaseJobPartitions(TaskExecutorPartitionTrackerImpl.java:62)
> 2020-05-06T19:13:12.4392914Z    at org.apache.flink.runtime.taskexecutor.TaskExecutor.releaseOrPromotePartitions(TaskExecutor.java:776)
> 2020-05-06T19:13:12.4393366Z    at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
> 2020-05-06T19:13:12.4393813Z    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-05-06T19:13:12.4394257Z    at java.lang.reflect.Method.invoke(Method.java:498)
> 2020-05-06T19:13:12.4394693Z    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:279)
> 2020-05-06T19:13:12.4395202Z    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:199)
> 2020-05-06T19:13:12.4395686Z    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
> 2020-05-06T19:13:12.4396165Z    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$$Lambda$72/775020844.apply(Unknown Source)
> 2020-05-06T19:13:12.4396606Z    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> 2020-05-06T19:13:12.4397015Z    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> 2020-05-06T19:13:12.4397447Z    at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> 2020-05-06T19:13:12.4397874Z    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> 2020-05-06T19:13:12.4398414Z    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> 2020-05-06T19:13:12.4398879Z    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-05-06T19:13:12.4399321Z    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> 2020-05-06T19:13:12.4399737Z    at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> 2020-05-06T19:13:12.4400138Z    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> 2020-05-06T19:13:12.4400552Z    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> 2020-05-06T19:13:12.4400930Z    at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> 2020-05-06T19:13:12.4401390Z    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> 2020-05-06T19:13:12.4401763Z    at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> 2020-05-06T19:13:12.4402135Z    at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> 2020-05-06T19:13:12.4402540Z    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 2020-05-06T19:13:12.4402984Z    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 2020-05-06T19:13:12.4403448Z    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-05-06T19:13:12.4404096Z    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)