You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Harsh Sharma <ha...@gmail.com> on 2021/08/30 13:26:05 UTC
Connection reset by peer : failed to remove cache rdd
We are facing issue in production where we are getting frequent
Still have 1 request outstanding when connection with the hostname was closed
connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable.
Please help us how to mitigate this :
Executor memory : 12g
Network timeout : 600000
Heartbeat interval : 250000
[Stage 284:============>(199 + 1) / 200][Stage 292:> (1 + 3) / 200]
[Stage 284:============>(199 + 1) / 200][Stage 292:> (2 + 3) / 200]
[Stage 292:> (2 + 4) / 200][14/06/21 10:46:17,006 WARN shuffle-server-4](TransportChannelHandler) Exception in connection from <hostname>
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
[14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler) Still have 1 requests outstanding when connection from <hostname> is closed
[14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error cleaning broadcast 159
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
[14/06/21 10:46:17,012 WARN block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove broadcast 159 with removeFromMaster = true - Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:748)
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Connection reset by peer : failed to remove cache rdd
Posted by Harsh Sharma <ha...@gmail.com>.
On 2021/08/30 13:32:19, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi,
>
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
>
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <ha...@gmail.com>
> wrote:
>
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings : failed to remove
> > cache rdd or failed to remove broadcast variable.
> >
> > Please help us how to mitigate this :
> >
> > Executor memory : 12g
> >
> > Network timeout : 600000
> >
> > Heartbeat interval : 250000
> >
> >
> >
> > [Stage 284:============>(199 + 1) / 200][Stage 292:> (1 + 3)
> > / 200]
> > [Stage 284:============>(199 + 1) / 200][Stage 292:> (2 + 3)
> > / 200]
> > [Stage 292:> (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > <hostname>
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from <hostname> is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,012 WARN
> > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
> >
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Connection reset by peer : failed to remove cache rdd
Posted by Harsh Sharma <ha...@gmail.com>.
On 2021/09/02 06:00:26, Harsh Sharma <ha...@gmail.com> wrote:
> Please Find reply :
> Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming?
>
> ans :its Spark SQL
>
> Do you use broadcast variables ?
>
> ans : yes we are using broadcast variables
> or are the errors
> coming from broadcast joins perhaps?
not sure about this
>
> On 2021/08/30 13:32:19, Jacek Laskowski <ja...@japila.pl> wrote:
> > Hi,
> >
> > No idea what might be going on here, but I'd not worry much about it and
> > simply monitor disk usage as some broadcast blocks might have left over.
> >
> > Do you know when in your application lifecycle it happens? Spark SQL or
> > Structured Streaming? Do you use broadcast variables or are the errors
> > coming from broadcast joins perhaps?
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > ----
> > https://about.me/JacekLaskowski
> > "The Internals Of" Online Books <https://books.japila.pl/>
> > Follow me on https://twitter.com/jaceklaskowski
> >
> > <https://twitter.com/jaceklaskowski>
> >
> >
> > On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <ha...@gmail.com>
> > wrote:
> >
> > > We are facing issue in production where we are getting frequent
> > >
> > > Still have 1 request outstanding when connection with the hostname was
> > > closed
> > >
> > > connection reset by peer : errors as well as warnings : failed to remove
> > > cache rdd or failed to remove broadcast variable.
> > >
> > > Please help us how to mitigate this :
> > >
> > > Executor memory : 12g
> > >
> > > Network timeout : 600000
> > >
> > > Heartbeat interval : 250000
> > >
> > >
> > >
> > > [Stage 284:============>(199 + 1) / 200][Stage 292:> (1 + 3)
> > > / 200]
> > > [Stage 284:============>(199 + 1) / 200][Stage 292:> (2 + 3)
> > > / 200]
> > > [Stage 292:> (2 + 4)
> > > / 200][14/06/21 10:46:17,006 WARN
> > > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > > <hostname>
> > > java.io.IOException: Connection reset by peer
> > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > > at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > > at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > > at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > > at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > > at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > > at java.lang.Thread.run(Thread.java:748)
> > > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > > Still have 1 requests outstanding when connection from <hostname> is closed
> > > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > > cleaning broadcast 159
> > > java.io.IOException: Connection reset by peer
> > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > > at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > > at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > > at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > > at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > > at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > > at java.lang.Thread.run(Thread.java:748)
> > > [14/06/21 10:46:17,012 WARN
> > > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > > java.io.IOException: Connection reset by peer
> > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > > at
> > > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > > at
> > > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > > at
> > > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > > at
> > > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > > at
> > > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > > at
> > > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > > at java.lang.Thread.run(Thread.java:748)
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Connection reset by peer : failed to remove cache rdd
Posted by Harsh Sharma <ha...@gmail.com>.
Please Find reply :
Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming?
ans :its Spark SQL
Do you use broadcast variables ?
ans : yes we are using broadcast variables
or are the errors
coming from broadcast joins perhaps?
ans :we are not using Boardcast join
On 2021/08/30 13:32:19, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi,
>
> No idea what might be going on here, but I'd not worry much about it and
> simply monitor disk usage as some broadcast blocks might have left over.
>
> Do you know when in your application lifecycle it happens? Spark SQL or
> Structured Streaming? Do you use broadcast variables or are the errors
> coming from broadcast joins perhaps?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <ha...@gmail.com>
> wrote:
>
> > We are facing issue in production where we are getting frequent
> >
> > Still have 1 request outstanding when connection with the hostname was
> > closed
> >
> > connection reset by peer : errors as well as warnings : failed to remove
> > cache rdd or failed to remove broadcast variable.
> >
> > Please help us how to mitigate this :
> >
> > Executor memory : 12g
> >
> > Network timeout : 600000
> >
> > Heartbeat interval : 250000
> >
> >
> >
> > [Stage 284:============>(199 + 1) / 200][Stage 292:> (1 + 3)
> > / 200]
> > [Stage 284:============>(199 + 1) / 200][Stage 292:> (2 + 3)
> > / 200]
> > [Stage 292:> (2 + 4)
> > / 200][14/06/21 10:46:17,006 WARN
> > shuffle-server-4](TransportChannelHandler) Exception in connection from
> > <hostname>
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> > Still have 1 requests outstanding when connection from <hostname> is closed
> > [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> > cleaning broadcast 159
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> > [14/06/21 10:46:17,012 WARN
> > block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> > broadcast 159 with removeFromMaster = true - Connection reset by peer
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> > at
> > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> > io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> > at java.lang.Thread.run(Thread.java:748)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
> >
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Connection reset by peer : failed to remove cache rdd
Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,
No idea what might be going on here, but I'd not worry much about it and
simply monitor disk usage as some broadcast blocks might have left over.
Do you know when in your application lifecycle it happens? Spark SQL or
Structured Streaming? Do you use broadcast variables or are the errors
coming from broadcast joins perhaps?
Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
On Mon, Aug 30, 2021 at 3:26 PM Harsh Sharma <ha...@gmail.com>
wrote:
> We are facing issue in production where we are getting frequent
>
> Still have 1 request outstanding when connection with the hostname was
> closed
>
> connection reset by peer : errors as well as warnings : failed to remove
> cache rdd or failed to remove broadcast variable.
>
> Please help us how to mitigate this :
>
> Executor memory : 12g
>
> Network timeout : 600000
>
> Heartbeat interval : 250000
>
>
>
> [Stage 284:============>(199 + 1) / 200][Stage 292:> (1 + 3)
> / 200]
> [Stage 284:============>(199 + 1) / 200][Stage 292:> (2 + 3)
> / 200]
> [Stage 292:> (2 + 4)
> / 200][14/06/21 10:46:17,006 WARN
> shuffle-server-4](TransportChannelHandler) Exception in connection from
> <hostname>
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:748)
> [14/06/21 10:46:17,010 ERROR shuffle-server-4](TransportResponseHandler)
> Still have 1 requests outstanding when connection from <hostname> is closed
> [14/06/21 10:46:17,012 ERROR Spark Context Cleaner](ContextCleaner) Error
> cleaning broadcast 159
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:748)
> [14/06/21 10:46:17,012 WARN
> block-manager-ask-thread-pool-69](BlockManagerMaster) Failed to remove
> broadcast 159 with removeFromMaster = true - Connection reset by peer
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378)
> at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:748)
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>