You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Xi Shen <da...@gmail.com> on 2015/03/26 06:28:26 UTC

How to troubleshoot server.TransportChannelHandler Exception

Hi,

My environment is Windows 64bit, Spark + YARN. I had a job that takes a
long time. It starts well, but it ended with below exception:

15/03/25 12:39:09 WARN server.TransportChannelHandler: Exception in
connection from
headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net/100.72.68.34:58507
java.io.IOException: An existing connection was forcibly closed by the
remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
15/03/25 12:39:09 ERROR executor.CoarseGrainedExecutorBackend: Driver
Disassociated [akka.tcp://
sparkExecutor@workernode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:65469]
-> [akka.tcp://
sparkDriver@headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:58467]
disassociated! Shutting down.
15/03/25 12:39:09 WARN remote.ReliableDeliverySupervisor: Association with
remote system [akka.tcp://
sparkDriver@headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:58467]
has failed, address is now gated for [5000] ms. Reason is: [Disassociated].

Interestingly, the job is shown as Succeeded in the RM. I checked the
application log, it is miles long, and this is the only exception I found.
And it is no very useful to help me pin point the problem.

Any idea what would be the cause?


Thanks,


[image: --]
Xi Shen
[image: http://]about.me/davidshen
<http://about.me/davidshen?promo=email_sig>
  <http://about.me/davidshen>

Re: How to troubleshoot server.TransportChannelHandler Exception

Posted by Xi Shen <da...@gmail.com>.
ah~hell, I am using Spark 1.2.0, and my job was submitted to use 8
cores...the magic number in the bug.




[image: --]
Xi Shen
[image: http://]about.me/davidshen
<http://about.me/davidshen?promo=email_sig>
  <http://about.me/davidshen>

On Thu, Mar 26, 2015 at 5:48 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Whats your spark version? Not quiet sure, but you could be hitting this
> issue
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4516
> On 26 Mar 2015 11:01, "Xi Shen" <da...@gmail.com> wrote:
>
>> Hi,
>>
>> My environment is Windows 64bit, Spark + YARN. I had a job that takes a
>> long time. It starts well, but it ended with below exception:
>>
>> 15/03/25 12:39:09 WARN server.TransportChannelHandler: Exception in
>> connection from
>> headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net/100.72.68.34:58507
>> java.io.IOException: An existing connection was forcibly closed by the
>> remote host
>> at sun.nio.ch.SocketDispatcher.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
>> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>> at
>> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
>> at
>> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>> at
>> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>> at
>> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>> at java.lang.Thread.run(Thread.java:745)
>> 15/03/25 12:39:09 ERROR executor.CoarseGrainedExecutorBackend: Driver
>> Disassociated [akka.tcp://
>> sparkExecutor@workernode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:65469]
>> -> [akka.tcp://
>> sparkDriver@headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:58467]
>> disassociated! Shutting down.
>> 15/03/25 12:39:09 WARN remote.ReliableDeliverySupervisor: Association
>> with remote system [akka.tcp://
>> sparkDriver@headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:58467]
>> has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>>
>> Interestingly, the job is shown as Succeeded in the RM. I checked the
>> application log, it is miles long, and this is the only exception I found.
>> And it is no very useful to help me pin point the problem.
>>
>> Any idea what would be the cause?
>>
>>
>> Thanks,
>>
>>
>> [image: --]
>> Xi Shen
>> [image: http://]about.me/davidshen
>> <http://about.me/davidshen?promo=email_sig>
>>   <http://about.me/davidshen>
>>
>

Re: How to troubleshoot server.TransportChannelHandler Exception

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Whats your spark version? Not quiet sure, but you could be hitting this
issue https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-4516
On 26 Mar 2015 11:01, "Xi Shen" <da...@gmail.com> wrote:

> Hi,
>
> My environment is Windows 64bit, Spark + YARN. I had a job that takes a
> long time. It starts well, but it ended with below exception:
>
> 15/03/25 12:39:09 WARN server.TransportChannelHandler: Exception in
> connection from
> headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net/100.72.68.34:58507
> java.io.IOException: An existing connection was forcibly closed by the
> remote host
> at sun.nio.ch.SocketDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> at java.lang.Thread.run(Thread.java:745)
> 15/03/25 12:39:09 ERROR executor.CoarseGrainedExecutorBackend: Driver
> Disassociated [akka.tcp://
> sparkExecutor@workernode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:65469]
> -> [akka.tcp://
> sparkDriver@headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:58467]
> disassociated! Shutting down.
> 15/03/25 12:39:09 WARN remote.ReliableDeliverySupervisor: Association with
> remote system [akka.tcp://
> sparkDriver@headnode0.xshe3539-hadoop-sydney.q10.internal.cloudapp.net:58467]
> has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>
> Interestingly, the job is shown as Succeeded in the RM. I checked the
> application log, it is miles long, and this is the only exception I found.
> And it is no very useful to help me pin point the problem.
>
> Any idea what would be the cause?
>
>
> Thanks,
>
>
> [image: --]
> Xi Shen
> [image: http://]about.me/davidshen
> <http://about.me/davidshen?promo=email_sig>
>   <http://about.me/davidshen>
>