You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jim Carroll <ji...@gmail.com> on 2014/09/11 20:22:20 UTC

Network requirements between Driver, Master, and Slave

Hello all,

I'm trying to run a Driver on my local network with a deployment on EC2 and
it's not working. I was wondering if either the master or slave instances
(in standalone) connect back to the driver program.

I outlined the details of my observations in a previous post but here is
what I'm seeing:

I have v1.1.0 installed (the new tag) on ec2 using the spark-ec2 script.
I have the same version of the code built locally.
I edited the master security group to allow inbound access from anywhere to
7077 and 8080.
I see a connection take place.
I see the workers fail with a timeout when any job is run.
The master eventually removes the driver's job.

I supposed this makes sense if there's a requirement for either the worker
or the master to be on the same network as the driver. Is that the case?

Thanks
Jim




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Network requirements between Driver, Master, and Slave

Posted by Mayur Rustagi <ma...@gmail.com>.
Driver needs a consistent connection to the master in standalone mode as whole bunch of client stuff happens on the driver. So calls like parallelize send data from driver to the master & collect send data from master to the driver. 

If you are looking to avoid the connect you can look into embedded driver model in yarn where the driver will also run inside the cluster & hence reliability & connectivity is a given. 
-- 
Regards,
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi

On Fri, Sep 12, 2014 at 6:46 PM, Jim Carroll <ji...@gmail.com>
wrote:

> Hi Akhil,
> Thanks! I guess in short that means the master (or slaves?) connect back to
> the driver. This seems like a really odd way to work given the driver needs
> to already connect to the master on port 7077. I would have thought that if
> the driver could initiate a connection to the master, that would be all
> that's required.
> Can you describe what it is about the architecture that requires the master
> to connect back to the driver even when the driver initiates a connection to
> the master? Just curious.
> Thanks anyway.
> Jim
>  
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997p14086.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org

Re: Network requirements between Driver, Master, and Slave

Posted by Jim Carroll <ji...@gmail.com>.
Hi Akhil,

Thanks! I guess in short that means the master (or slaves?) connect back to
the driver. This seems like a really odd way to work given the driver needs
to already connect to the master on port 7077. I would have thought that if
the driver could initiate a connection to the master, that would be all
that's required.

Can you describe what it is about the architecture that requires the master
to connect back to the driver even when the driver initiates a connection to
the master? Just curious.

Thanks anyway.
Jim
 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997p14086.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Network requirements between Driver, Master, and Slave

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Hi Jim,

This approach will not work right out of the box. You need to understand a
few things. A driver program and the master will be communicating with each
other, for that you need to open up certain ports for your public ip (Read
about port forwarding <http://portforward.com/>). Also on the cluster you
need to set *spark.driver.host* and *spark.driver.port *(by default this is
random) pointing to your public ip and the port that you opened up.


Thanks
Best Regards

On Thu, Sep 11, 2014 at 11:52 PM, Jim Carroll <ji...@gmail.com> wrote:

> Hello all,
>
> I'm trying to run a Driver on my local network with a deployment on EC2 and
> it's not working. I was wondering if either the master or slave instances
> (in standalone) connect back to the driver program.
>
> I outlined the details of my observations in a previous post but here is
> what I'm seeing:
>
> I have v1.1.0 installed (the new tag) on ec2 using the spark-ec2 script.
> I have the same version of the code built locally.
> I edited the master security group to allow inbound access from anywhere to
> 7077 and 8080.
> I see a connection take place.
> I see the workers fail with a timeout when any job is run.
> The master eventually removes the driver's job.
>
> I supposed this makes sense if there's a requirement for either the worker
> or the master to be on the same network as the driver. Is that the case?
>
> Thanks
> Jim
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Network-requirements-between-Driver-Master-and-Slave-tp13997.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>