You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Wisc Forum <wi...@gmail.com> on 2013/11/17 20:30:54 UTC

How to add more worker node to spark cluster on EC2

Hi, I have a job that runs on Spark on EC2. The cluster currently contains 1 master node and 2 worker node.

I am planning to add several other worker nodes to the cluster. How should I do that so the master node knows the new worker nodes?

I couldn't find the documentation on it in Spark's site. Can anybody help a bit?

Thanks,
Xiaobing

Re: How to add more worker node to spark cluster on EC2

Posted by Wisc Forum <wi...@gmail.com>.
Thank you, I will try that :)

On Nov 17, 2013, at 7:06 PM, Aaron Davidson <il...@gmail.com> wrote:

> Hi Xiaobing,
> 
> At its heart, this is a very easy thing to do. Instead of the master reaching out to the workers, the worker just needs to find the master. In standalone mode, this can be accomplished simply by setting the SPARK_MASTER_IP/_PORT variables in spark-env.sh. 
> 
> In order to make the other scripts work nicely, such as start-all.sh, stop-all.sh, etc., you may also want to add the new workers' addresses to ~/spark/conf/slaves and ~/spark-ec2/slaves. If you do this, then you can just copy all the Spark configuration from the master by using a command like this (assuming you installed from the ec2 scripts):
> ~/spark-ec2/copy-file ~/spark/conf/
> and then everyone should be happy.
> 
> 
> On Sun, Nov 17, 2013 at 11:30 AM, Wisc Forum <wi...@gmail.com> wrote:
> Hi, I have a job that runs on Spark on EC2. The cluster currently contains 1 master node and 2 worker node.
> 
> I am planning to add several other worker nodes to the cluster. How should I do that so the master node knows the new worker nodes?
> 
> I couldn't find the documentation on it in Spark's site. Can anybody help a bit?
> 
> Thanks,
> Xiaobing
> 


Re: How to add more worker node to spark cluster on EC2

Posted by Aaron Davidson <il...@gmail.com>.
Hi Xiaobing,

At its heart, this is a very easy thing to do. Instead of the master
reaching out to the workers, the worker just needs to find the master. In
standalone mode, this can be accomplished simply by setting the
SPARK_MASTER_IP/_PORT variables in spark-env.sh.

In order to make the other scripts work nicely, such as start-all.sh,
stop-all.sh, etc., you may also want to add the new workers' addresses to
~/spark/conf/slaves and ~/spark-ec2/slaves. If you do this, then you can
just copy all the Spark configuration from the master by using a command
like this (assuming you installed from the ec2 scripts):
~/spark-ec2/copy-file ~/spark/conf/
and then everyone should be happy.


On Sun, Nov 17, 2013 at 11:30 AM, Wisc Forum <wi...@gmail.com> wrote:

> Hi, I have a job that runs on Spark on EC2. The cluster currently contains
> 1 master node and 2 worker node.
>
> I am planning to add several other worker nodes to the cluster. How should
> I do that so the master node knows the new worker nodes?
>
> I couldn't find the documentation on it in Spark's site. Can anybody help
> a bit?
>
> Thanks,
> Xiaobing