You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "angerszhu (JIRA)" <ji...@apache.org> on 2019/08/16 05:00:00 UTC

[jira] [Comment Edited] (SPARK-28726) Spark with DynamicAllocation always got connect rest by peers

    [ https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908711#comment-16908711 ] 

angerszhu edited comment on SPARK-28726 at 8/16/19 4:59 AM:
------------------------------------------------------------

[~hyukjin.kwon]

spark thrift server, sql

dynamic allocation enable and idle timeout time 5s。

 


was (Author: angerszhuuu):
[~hyukjin.kwon]

spark thrift server, sql

dynamic allocation enable and idle timeout time 5s。

> Spark with DynamicAllocation always got connect rest by peers
> -------------------------------------------------------------
>
>                 Key: SPARK-28726
>                 URL: https://issues.apache.org/jira/browse/SPARK-28726
>             Project: Spark
>          Issue Type: Wish
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: angerszhu
>            Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>  
> I suspect that it's because we set idle time 5s is too small, it will cause when Blockmanager call netty io, the executor has been remove because of timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN org.apache.spark.network.server.TransportChannelHandler: "Exception in connection from /host:port"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint: "Error trying to remove broadcast 67 from block manager BlockManagerId(967, host, port, None)"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
>  at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
>  at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
>  at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>  at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>  at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>  at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>  at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed to remove shuffle 22 - Connection reset by peer"
> java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org