You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Harsh Sharma <ha...@gmail.com> on 2021/08/30 13:26:05 UTC

Connection reset by peer : failed to remove cache rdd

We are facing issue in production where we are getting frequent

Still have 1 request outstanding when connection with the hostname was closed

connection reset by peer : errors as well as warnings  : failed to remove cache rdd or failed  to remove broadcast variable.

Please help us how to mitigate this  :

Executor memory : 12g

Network timeout :   600000

Heartbeat interval : 250000

 

[Stage 284:============>(199 + 1) / 200][Stage 292:>              (1 + 3) / 200]
[Stage 284:============>(199 + 1) / 200][Stage 292:>              (2 + 3) / 200]
[Stage 292:>                                                      (2 + 4) / 200][14/06/21 10:46:17,006 WARN  shuffle-server-4](TransportChannelHandler) Exception in connection from <hostname>
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:748)
[14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler) Still have 1 requests outstanding when connection from <hostname> is closed
[14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error cleaning broadcast 159
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:748)
[14/06/21 10:46:17,012 WARN  block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove broadcast 159 with removeFromMaster = true - Connection reset by peer
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:748)

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Connection reset by peer : failed to remove cache rdd

Posted by Harsh Sharma <ha...@gmail.com>.

On 2021/08/30 13:32:19, Jacek Laskowski <ja...@japila.pl> wrote: 
> Hi,
> 
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
> 
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
> 
> <https://twitter.com/jaceklaskowski>
> 
> 
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <ha...@gmail.com>
> wrote:
> 
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings  : failed to remove
> > cache rdd or failed  to remove broadcast variable.
> >
> > Please help us how to mitigate this  :
> >
> > Executor memory : 12g
> >
> > Network timeout :   600000
> >
> > Heartbeat interval : 250000
> >
> >
> >
> > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (1 + 3)
> > / 200]
> > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (2 + 3)
> > / 200]
> > [Stage 292:>                                                      (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > <hostname>
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from <hostname> is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,012 WARN
> > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Connection reset by peer : failed to remove cache rdd

Posted by Harsh Sharma <ha...@gmail.com>.

On 2021/09/02 06:00:26, Harsh Sharma <ha...@gmail.com> wrote: 
> Please Find reply : 
> Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming? 
> 
> ans :its Spark SQL
> 
> Do you use broadcast variables ?
> 
> ans : yes we are using broadcast variables
>  or are the errors
>  coming from broadcast joins perhaps? 
not sure about this

> 
> On 2021/08/30 13:32:19, Jacek Laskowski <ja...@japila.pl> wrote: 
> > Hi,
> > 
> > No idea what might be going on here, but I'd not worry much about it and
> > simply monitor disk usage as some broadcast blocks might have left over.
> > 
> > Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming? Do you use broadcast variables or are the errors
> > coming from broadcast joins perhaps?
> > 
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://about.me/JacekLaskowski
> > "The Internals Of" Online Books <https://books.japila.pl/>
> > Follow me on https://twitter.com/jaceklaskowski
> > 
> > <https://twitter.com/jaceklaskowski>
> > 
> > 
> > On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <ha...@gmail.com>
> > wrote:
> > 
> > > We are facing issue in production where we are getting frequent
> > >
> > > Still have 1 request outstanding when connection with the hostname was
> > > closed
> > >
> > > connection reset by peer : errors as well as warnings  : failed to remove
> > > cache rdd or failed  to remove broadcast variable.
> > >
> > > Please help us how to mitigate this  :
> > >
> > > Executor memory : 12g
> > >
> > > Network timeout :   600000
> > >
> > > Heartbeat interval : 250000
> > >
> > >
> > >
> > > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (1 + 3)
> > > / 200]
> > > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (2 + 3)
> > > / 200]
> > > [Stage 292:>                                                      (2 + 4)
> > > / 200][14/06/21 10:46:17,006 WARN
> > > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > > <hostname>
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > >         at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > >         at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > >         at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > >         at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > >         at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > >         at java.lang.Thread.run(Thread.java:748)
> > > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > > Still have 1 requests outstanding when connection from <hostname> is closed
> > > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > > cleaning broadcast 159
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > >         at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > >         at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > >         at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > >         at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > >         at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > >         at java.lang.Thread.run(Thread.java:748)
> > > [14/06/21 10:46:17,012 WARN
> > > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > >         at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > >         at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > >         at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > >         at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > >         at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > >         at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> > >
> > >
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Connection reset by peer : failed to remove cache rdd

Posted by Harsh Sharma <ha...@gmail.com>.
Please Find reply : 
Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? 

ans :its Spark SQL

Do you use broadcast variables ?

ans : yes we are using broadcast variables
 or are the errors
 coming from broadcast joins perhaps? 
ans :we are not using Boardcast join

On 2021/08/30 13:32:19, Jacek Laskowski <ja...@japila.pl> wrote: 
> Hi,
> 
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
> 
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
> 
> <https://twitter.com/jaceklaskowski>
> 
> 
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <ha...@gmail.com>
> wrote:
> 
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings  : failed to remove
> > cache rdd or failed  to remove broadcast variable.
> >
> > Please help us how to mitigate this  :
> >
> > Executor memory : 12g
> >
> > Network timeout :   600000
> >
> > Heartbeat interval : 250000
> >
> >
> >
> > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (1 + 3)
> > / 200]
> > [Stage 284:============>(199 + 1) / 200][Stage 292:>              (2 + 3)
> > / 200]
> > [Stage 292:>                                                      (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > <hostname>
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from <hostname> is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,012 WARN
> > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > java.io.IOException: Connection reset by peer
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> >         at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> >         at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >         at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >         at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >         at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >         at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Connection reset by peer : failed to remove cache rdd

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

No idea what might be going on here, but I'd not worry much about it and
simply monitor disk usage as some broadcast blocks might have left over.

Do you know when in your application lifecycle it happens? Spark SQL or
Structured Streaming? Do you use broadcast variables or are the errors
coming from broadcast joins perhaps?

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>


On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <ha...@gmail.com>
wrote:

> We are facing issue in production where we are getting frequent
>
> Still have 1 request outstanding when connection with the hostname was
> closed
>
> connection reset by peer : errors as well as warnings  : failed to remove
> cache rdd or failed  to remove broadcast variable.
>
> Please help us how to mitigate this  :
>
> Executor memory : 12g
>
> Network timeout :   600000
>
> Heartbeat interval : 250000
>
>
>
> [Stage 284:============>(199 + 1) / 200][Stage 292:>              (1 + 3)
> / 200]
> [Stage 284:============>(199 + 1) / 200][Stage 292:>              (2 + 3)
> / 200]
> [Stage 292:>                                                      (2 + 4)
> / 200][14/06/21 10:46:17,006 WARN
> shuffle-server-4](TransportChannelHandler) Exception in connection from
> <hostname>
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
>         at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
>         at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>         at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
>         at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Thread.java:748)
> [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> Still have 1 requests outstanding when connection from <hostname> is closed
> [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> cleaning broadcast 159
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
>         at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
>         at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>         at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
>         at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Thread.java:748)
> [14/06/21 10:46:17,012 WARN
> block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> broadcast 159 with removeFromMaster = true - Connection reset by peer
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
>         at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
>         at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>         at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
>         at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Thread.java:748)
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>