You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Robert James <sr...@gmail.com> on 2014/07/09 03:24:05 UTC

Requirements for Spark cluster

I have a Spark app which runs well on local master.  I'm now ready to
put it on a cluster.  What needs to be installed on the master? What
needs to be installed on the workers?

If the cluster already has Hadoop or YARN or Cloudera, does it still
need an install of Spark?

Re: Requirements for Spark cluster

Posted by Krishna Sankar <ks...@gmail.com>.

I rsync the spark-1.0.1 directory to all the nodes. Yep, one needs Spark in
all the nodes irrespective of Hadoop/YARN.
Cheers
<k/>

On Tue, Jul 8, 2014 at 6:24 PM, Robert James <sr...@gmail.com> wrote:

> I have a Spark app which runs well on local master.  I'm now ready to
> put it on a cluster.  What needs to be installed on the master? What
> needs to be installed on the workers?
>
> If the cluster already has Hadoop or YARN or Cloudera, does it still
> need an install of Spark?
>

Re: Requirements for Spark cluster

Posted by Sandy Ryza <sa...@cloudera.com>.

Hi Robert,

If you're running Spark against YARN, you don't need to install anything
Spark-specific on all the nodes.  For each application, the client will
copy the Spark jar to HDFS where the Spark processes can fetch it.  For
faster app startup, you can copy the Spark jar to a public location on HDFS
and point to it there.  The YARN NodeManagers will cache it on each node
after it's fetched the first time.

-Sandy

On Tue, Jul 8, 2014 at 6:24 PM, Robert James <sr...@gmail.com> wrote:

> I have a Spark app which runs well on local master.  I'm now ready to
> put it on a cluster.  What needs to be installed on the master? What
> needs to be installed on the workers?
>
> If the cluster already has Hadoop or YARN or Cloudera, does it still
> need an install of Spark?
>

Re: Requirements for Spark cluster

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

You can use the spark-ec2/bdutil scripts to set it up on the AWS/GCE cloud
quickly.

If you want to set it up on your own then these are the things that you
will need to do:

1. Make sure you have java (7) installed on all machines.
2. Install and configure spark (add all slave nodes in conf/slaves file)
3. Rsync spark directory to all slaves and start it.

If you are already having hadoop, then you might need to get that version
of spark which is compiled with that hadoop version (or you can compile it
with option SPARK_HADOOP_VERSION)

Thanks
Best Regards

On Wed, Jul 9, 2014 at 6:54 AM, Robert James <sr...@gmail.com> wrote:

> I have a Spark app which runs well on local master.  I'm now ready to
> put it on a cluster.  What needs to be installed on the master? What
> needs to be installed on the workers?
>
> If the cluster already has Hadoop or YARN or Cloudera, does it still
> need an install of Spark?
>