You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ajith S (JIRA)" <ji...@apache.org> on 2019/08/14 09:54:00 UTC

[jira] [Commented] (SPARK-28726) Spark with DynamicAllocation always got connect rest by peers

    [ https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907112#comment-16907112 ] 

Ajith S commented on SPARK-28726:
---------------------------------

As i see, this is driver trying to clean up RDDs, broadcasts etc from the expiring executor and meanwhile the executor has gone down, which is why such exceptions are under warning. Does the issue occur with higher timeouts too.? 

> Spark with DynamicAllocation always got connect rest by peers
> -------------------------------------------------------------
>
>                 Key: SPARK-28726
>                 URL: https://issues.apache.org/jira/browse/SPARK-28726
>             Project: Spark
>          Issue Type: Wish
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: angerszhu
>            Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>  
> I suspect that it's because we set idle time 5s is too small, it will cause when Blockmanager call netty io, the executor has been remove because of timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN org.apache.spark.network.server.TransportChannelHandler: "Exception in connection from /host:port"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: "Error trying to remove broadcast 67 from block manager BlockManagerId(967, host, port, None)"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed to remove shuffle 22 - Connection reset by peer"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org