You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "DUC LIEM NGUYEN (JIRA)" <ji...@apache.org> on 2017/10/28 20:12:00 UTC

[jira] [Updated] (SPARK-22382) Spark on mesos: doesn't support public IP setup for agent and master.

     [ https://issues.apache.org/jira/browse/SPARK-22382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

DUC LIEM NGUYEN updated SPARK-22382:
------------------------------------
    Description: 
I've installed a system as followed:

--mesos master private IP of 10.x.x.2 , Public 35.x.x.6

--mesos slave private IP of 192.x.x.10, Public 111.x.x.2

Now the master assigned the task successfully to the slave, however, the task failed. The error message is as followed:

Exception in thread "main" 17/10/11 22:38:01 ERROR RpcOutboxMessage: Ask timeout before connecting successfully

Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
When I look at the environment, the spark.driver.host points to the private IP address of the master 10.x.x.2 instead of it public IP address 35.x.x.6. I look at the Wireshark capture and indeed, there was failed TCP package to the master private IP address.

Now if I set spark.driver.bindAddress from the master to its local IP address, spark.driver.host from the master to its public IP address, I get the following message.

ERROR TaskSchedulerImpl: Lost executor 1 on myhostname.singnet.com.sg: Unable to create executor due to Cannot assign requested address.

From my understanding, the spark.driver.bindAddress set it for both master and slave, hence the slave get the said error. Now I'm really wondering how do I proper setup spark to work on this clustering over public IP?

> Spark on mesos: doesn't support public IP setup for agent and master. 
> ----------------------------------------------------------------------
>
>                 Key: SPARK-22382
>                 URL: https://issues.apache.org/jira/browse/SPARK-22382
>             Project: Spark
>          Issue Type: Question
>          Components: Mesos
>    Affects Versions: 2.1.1
>            Reporter: DUC LIEM NGUYEN
>
> I've installed a system as followed:
> --mesos master private IP of 10.x.x.2 , Public 35.x.x.6
> --mesos slave private IP of 192.x.x.10, Public 111.x.x.2
> Now the master assigned the task successfully to the slave, however, the task failed. The error message is as followed:
> Exception in thread "main" 17/10/11 22:38:01 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
> Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
> When I look at the environment, the spark.driver.host points to the private IP address of the master 10.x.x.2 instead of it public IP address 35.x.x.6. I look at the Wireshark capture and indeed, there was failed TCP package to the master private IP address.
> Now if I set spark.driver.bindAddress from the master to its local IP address, spark.driver.host from the master to its public IP address, I get the following message.
> ERROR TaskSchedulerImpl: Lost executor 1 on myhostname.singnet.com.sg: Unable to create executor due to Cannot assign requested address.
> From my understanding, the spark.driver.bindAddress set it for both master and slave, hence the slave get the said error. Now I'm really wondering how do I proper setup spark to work on this clustering over public IP?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org