You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Abdelrahman Elsaidy (JIRA)" <ji...@apache.org> on 2017/06/15 11:30:01 UTC

[jira] [Commented] (SPARK-14437) Spark using Netty RPC gets wrong address in some setups

    [ https://issues.apache.org/jira/browse/SPARK-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050359#comment-16050359 ] 

Abdelrahman Elsaidy commented on SPARK-14437:
---------------------------------------------

[~hogeland] Was an issue created for the spark 2.0.0 error? I am getting the same error when I run a spark job with multiple executors on a single worker node using spark standalone cluster. However, when I run the job with a single executor on a single worker it completes successfully.

When using two executors, one executor completes while the other gets this error:
{code:java}
17/06/15 07:06:33 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2)
java.lang.RuntimeException: Stream '/files/script.py' was not found.
	at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:222)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:120)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:336)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:343)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:745)
{code}


> Spark using Netty RPC gets wrong address in some setups
> -------------------------------------------------------
>
>                 Key: SPARK-14437
>                 URL: https://issues.apache.org/jira/browse/SPARK-14437
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 1.6.0, 1.6.1
>         Environment: AWS, Docker, Flannel
>            Reporter: Kevin Hogeland
>            Assignee: Shixiong Zhu
>             Fix For: 2.0.0
>
>
> Netty can't get the correct origin address in certain network setups. Spark should handle this, as relying on Netty correctly reporting all addresses leads to incompatible and unpredictable network states. We're currently using Docker with Flannel on AWS. Container communication looks something like: {{Container 1 (1.2.3.1) -> Docker host A (1.2.3.0) -> Docker host B (4.5.6.0) -> Container 2 (4.5.6.1)}}
> If the client in that setup is Container 1 (1.2.3.4), Netty channels from there to Container 2 will have a client address of 1.2.3.0.
> The {{RequestMessage}} object that is sent over the wire already contains a {{senderAddress}} field that the sender can use to specify their address. In {{NettyRpcEnv#internalReceive}}, this is replaced with the Netty client socket address when null. {{senderAddress}} in the messages sent from the executors is currently always null, meaning all messages will have these incorrect addresses (we've switched back to Akka as a temporary workaround for this). The executor should send its address explicitly so that the driver doesn't attempt to infer addresses based on possibly incorrect information from Netty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org