You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by mharwida <ma...@yahoo.com> on 2014/01/20 19:14:25 UTC

Spark Master on Hadoop Job Tracker?

Hi,

Should the Spark Master run on the Hadoop Job Tracker node (and Spark
workers on Task Trackers) or the placement of the Spark Master could reside
on any Hadoop node?

Thanks
Majd



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Master-on-Hadoop-Job-Tracker-tp680.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Master on Hadoop Job Tracker?

Posted by mharwida <ma...@yahoo.com>.

Many thanks for the replies.

The way I currently have my set up is as follows;
6 nodes running Hadoop with each node having approximately 5GB of data.
Launched a Spark Master (and Shark via ./shark) on one of the Hadoop nodes
and launched 5 worker Spark nodes on the remaining 5 Hadoop nodes. 

So I'm assuming the setup above constitutes a Standalone deployment? and
from reading the documentation, it was mentioned that it's best to have
Spark as close as possible to the HDFS so hence my choice in using the
mentioned setup.

Is that the best way to setup Spark to ensure data locality? and would there
be any benefits in running Mesos in that setup as well?

Thanks
Majd




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Master-on-Hadoop-Job-Tracker-tp680p714.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Master on Hadoop Job Tracker?

Posted by Nick Pentreath <ni...@gmail.com>.

If you intend to run Hadoop mapReduce and Spark on the same cluster concurrently, and you have enough memory on the jobtracker master, then you can run the Spark master (for standalone as Raymond mentions) on the same node . This is not necessary but more for convenience so you only have so ssh into one master (usually id put hive/shark server, spark master, etc on same node).—
Sent from Mailbox for iPhone

On Mon, Jan 20, 2014 at 8:14 PM, mharwida <ma...@yahoo.com> wrote:

> Hi,
> Should the Spark Master run on the Hadoop Job Tracker node (and Spark
> workers on Task Trackers) or the placement of the Spark Master could reside
> on any Hadoop node?
> Thanks
> Majd
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Master-on-Hadoop-Job-Tracker-tp680.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Spark Master on Hadoop Job Tracker?

Posted by "Liu, Raymond" <ra...@intel.com>.

Not sure what did you aim to solve. When you mention Spark Master, I guess you probably mean spark standalone mode? In that case spark cluster does not necessary coupled with hadoop cluster. While if you aim to achieve better data locality , then yes, run spark worker on HDFS data node might help. And for spark Master, I think that doesn't matter much.

Best Regards,
Raymond Liu

-----Original Message-----
From: mharwida [mailto:majdharwida@yahoo.com] 
Sent: Tuesday, January 21, 2014 2:14 AM
To: user@spark.incubator.apache.org
Subject: Spark Master on Hadoop Job Tracker?

Hi,

Should the Spark Master run on the Hadoop Job Tracker node (and Spark workers on Task Trackers) or the placement of the Spark Master could reside on any Hadoop node?

Thanks
Majd

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Master-on-Hadoop-Job-Tracker-tp680.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.