You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by algermissen1971 <al...@icloud.com> on 2015/07/12 21:34:44 UTC

Master vs. Slave Nodes Clarification

Hi,

I have a question that I really have problems with figuring out for myself:

Does the master node in a spark cluster need to be a node similar to the slave nodes or should I rather view it as a coordinating node, that does not need much computing or storage power?

For example, when using Spark Streaming and Checkpointing, would the master node need access to the shared file system (e.g. HDFS)? Or do I only need to mount that on the slaves?
(likewise, if I use the Cassandra-Connector, does that (and C*) need to be installed on the master node, too?)

Or, in other words: is the master just one node of similar cluster nodes, or is it merely a 'small control node', for which sort of any small VM would do?

Jan


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Master vs. Slave Nodes Clarification

Posted by Tathagata Das <td...@databricks.com>.

Yep :)

On Tue, Jul 14, 2015 at 2:44 PM, algermissen1971 <algermissen1971@icloud.com
> wrote:

>
> On 14 Jul 2015, at 23:26, Tathagata Das <td...@databricks.com> wrote:
>
> > Just to be clear, you mean the Spark Standalone cluster manager's
> "master" and not the applications "driver", right.
>
> Sorry, by now I have understood that I would not necessarily put the
> driver app on the master node and that not making that distinction made my
> question kind of hard to answer :-)
>
> So far I have understood that for a spark streaming app that uses the
> cassandra connector (and also needs checkpointing):
>
> slaves: need Spark, C*, the connector and access to a distributed file
> system for the checkpointing
> master: needs Spark (configured as master) but none of the rest
> the node where the driver runs: needs spark,  C*, the connector and access
> to a distributed file system for the checkpointing
>
> Correct?
>
> (And thanks to everyone for the replies)
>
>
> Jan
>
>
>
> > In that case, the earlier responses are correct.
> >
> > TD
> >
> > On Tue, Jul 14, 2015 at 11:26 AM, Mohammed Guller <
> mohammed@glassbeam.com> wrote:
> > The master node does not have to be similar to the worker nodes. It can
> be a smaller machine.
> >
> > In case of C*, again you don't need to have C* on the master node. You
> need C* and Spark workers co-located. Master can be on one of the C* node
> or a non-C* node.
> >
> > Mohammed
> >
> >
> > -----Original Message-----
> > From: algermissen1971 [mailto:algermissen1971@icloud.com]
> > Sent: Sunday, July 12, 2015 12:35 PM
> > To: Spark User
> > Subject: Master vs. Slave Nodes Clarification
> >
> > Hi,
> >
> > I have a question that I really have problems with figuring out for
> myself:
> >
> > Does the master node in a spark cluster need to be a node similar to the
> slave nodes or should I rather view it as a coordinating node, that does
> not need much computing or storage power?
> >
> > For example, when using Spark Streaming and Checkpointing, would the
> master node need access to the shared file system (e.g. HDFS)? Or do I only
> need to mount that on the slaves?
> > (likewise, if I use the Cassandra-Connector, does that (and C*) need to
> be installed on the master node, too?)
> >
> > Or, in other words: is the master just one node of similar cluster
> nodes, or is it merely a 'small control node', for which sort of any small
> VM would do?
> >
> > Jan
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For
> additional commands, e-mail: user-help@spark.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
> >
>
>

Re: Master vs. Slave Nodes Clarification

Posted by algermissen1971 <al...@icloud.com>.

On 14 Jul 2015, at 23:26, Tathagata Das <td...@databricks.com> wrote:

> Just to be clear, you mean the Spark Standalone cluster manager's "master" and not the applications "driver", right. 

Sorry, by now I have understood that I would not necessarily put the driver app on the master node and that not making that distinction made my question kind of hard to answer :-)

So far I have understood that for a spark streaming app that uses the cassandra connector (and also needs checkpointing):

slaves: need Spark, C*, the connector and access to a distributed file system for the checkpointing
master: needs Spark (configured as master) but none of the rest
the node where the driver runs: needs spark,  C*, the connector and access to a distributed file system for the checkpointing

Correct?

(And thanks to everyone for the replies)


Jan



> In that case, the earlier responses are correct. 
> 
> TD
> 
> On Tue, Jul 14, 2015 at 11:26 AM, Mohammed Guller <mo...@glassbeam.com> wrote:
> The master node does not have to be similar to the worker nodes. It can be a smaller machine.
> 
> In case of C*, again you don't need to have C* on the master node. You need C* and Spark workers co-located. Master can be on one of the C* node or a non-C* node.
> 
> Mohammed
> 
> 
> -----Original Message-----
> From: algermissen1971 [mailto:algermissen1971@icloud.com]
> Sent: Sunday, July 12, 2015 12:35 PM
> To: Spark User
> Subject: Master vs. Slave Nodes Clarification
> 
> Hi,
> 
> I have a question that I really have problems with figuring out for myself:
> 
> Does the master node in a spark cluster need to be a node similar to the slave nodes or should I rather view it as a coordinating node, that does not need much computing or storage power?
> 
> For example, when using Spark Streaming and Checkpointing, would the master node need access to the shared file system (e.g. HDFS)? Or do I only need to mount that on the slaves?
> (likewise, if I use the Cassandra-Connector, does that (and C*) need to be installed on the master node, too?)
> 
> Or, in other words: is the master just one node of similar cluster nodes, or is it merely a 'small control node', for which sort of any small VM would do?
> 
> Jan
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Master vs. Slave Nodes Clarification

Posted by Tathagata Das <td...@databricks.com>.

Just to be clear, you mean the Spark Standalone cluster manager's "master"
and not the applications "driver", right.
In that case, the earlier responses are correct.

TD

On Tue, Jul 14, 2015 at 11:26 AM, Mohammed Guller <mo...@glassbeam.com>
wrote:

> The master node does not have to be similar to the worker nodes. It can be
> a smaller machine.
>
> In case of C*, again you don't need to have C* on the master node. You
> need C* and Spark workers co-located. Master can be on one of the C* node
> or a non-C* node.
>
> Mohammed
>
>
> -----Original Message-----
> From: algermissen1971 [mailto:algermissen1971@icloud.com]
> Sent: Sunday, July 12, 2015 12:35 PM
> To: Spark User
> Subject: Master vs. Slave Nodes Clarification
>
> Hi,
>
> I have a question that I really have problems with figuring out for myself:
>
> Does the master node in a spark cluster need to be a node similar to the
> slave nodes or should I rather view it as a coordinating node, that does
> not need much computing or storage power?
>
> For example, when using Spark Streaming and Checkpointing, would the
> master node need access to the shared file system (e.g. HDFS)? Or do I only
> need to mount that on the slaves?
> (likewise, if I use the Cassandra-Connector, does that (and C*) need to be
> installed on the master node, too?)
>
> Or, in other words: is the master just one node of similar cluster nodes,
> or is it merely a 'small control node', for which sort of any small VM
> would do?
>
> Jan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional
> commands, e-mail: user-help@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

RE: Master vs. Slave Nodes Clarification

Posted by Mohammed Guller <mo...@glassbeam.com>.

The master node does not have to be similar to the worker nodes. It can be a smaller machine.

In case of C*, again you don't need to have C* on the master node. You need C* and Spark workers co-located. Master can be on one of the C* node or a non-C* node.

Mohammed


-----Original Message-----
From: algermissen1971 [mailto:algermissen1971@icloud.com] 
Sent: Sunday, July 12, 2015 12:35 PM
To: Spark User
Subject: Master vs. Slave Nodes Clarification

Hi,

I have a question that I really have problems with figuring out for myself:

Does the master node in a spark cluster need to be a node similar to the slave nodes or should I rather view it as a coordinating node, that does not need much computing or storage power?

For example, when using Spark Streaming and Checkpointing, would the master node need access to the shared file system (e.g. HDFS)? Or do I only need to mount that on the slaves?
(likewise, if I use the Cassandra-Connector, does that (and C*) need to be installed on the master node, too?)

Or, in other words: is the master just one node of similar cluster nodes, or is it merely a 'small control node', for which sort of any small VM would do?

Jan


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Master vs. Slave Nodes Clarification

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

You are a bit confused about master node, slave node and the driver
machine.

1. Master node can be kept as a smaller machine in your dev environment,
mostly in production you will be using Mesos or Yarn cluster manager.

2. Now, if you are running your driver program (the streaming job) on the
master machine, then it need access to the HDFS or wherever the write is
happening.

3. Master node is more like a control node, yes a smaller machine would do
but when you run the driver program on master machine, it would be good a
have enough memory and cores for your job to have low latency.

Thanks
Best Regards

On Mon, Jul 13, 2015 at 1:04 AM, algermissen1971 <algermissen1971@icloud.com
> wrote:

> Hi,
>
> I have a question that I really have problems with figuring out for myself:
>
> Does the master node in a spark cluster need to be a node similar to the
> slave nodes or should I rather view it as a coordinating node, that does
> not need much computing or storage power?
>
> For example, when using Spark Streaming and Checkpointing, would the
> master node need access to the shared file system (e.g. HDFS)? Or do I only
> need to mount that on the slaves?
> (likewise, if I use the Cassandra-Connector, does that (and C*) need to be
> installed on the master node, too?)
>
> Or, in other words: is the master just one node of similar cluster nodes,
> or is it merely a 'small control node', for which sort of any small VM
> would do?
>
> Jan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>