You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:00:05 UTC

[jira] [Updated] (SPARK-17300) ClosedChannelException caused by missing block manager when speculative tasks are killed

     [ https://issues.apache.org/jira/browse/SPARK-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-17300:
---------------------------------
    Labels: bulk-closed  (was: )

> ClosedChannelException caused by missing block manager when speculative tasks are killed
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-17300
>                 URL: https://issues.apache.org/jira/browse/SPARK-17300
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Ryan Blue
>            Priority: Major
>              Labels: bulk-closed
>
> We recently backported SPARK-10530 to our Spark build, which kills unnecessary duplicate/speculative tasks when one completes (either a speculative task or the original). In large jobs with 500+ executors, this caused some executors to die and resulted in the same error that was fixed by SPARK-15262: ClosedChannelException when trying to connect to the block manager on affected hosts.
> {code}
> java.nio.channels.ClosedChannelException
> 	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
> 	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
> 	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
> 	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
> 	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
> 	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
> 	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
> 	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
> 	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
> 	at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
> 	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
> 	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
> 	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedChannelException
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org