You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Ascot Moss <as...@gmail.com> on 2016/07/28 01:48:13 UTC

A question about Spark Cluster vs Local Mode

Hi,

If I submit the same job to spark in cluster mode, does it mean in cluster
mode it will be run in cluster memory pool and it will fail if it runs out
of cluster's memory?

--driver-memory 64g \

--executor-memory 16g \

Regards

Re: A question about Spark Cluster vs Local Mode

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi

These are my notes on this topic.

   -

   *YARN Cluster Mode,* the Spark driver runs inside an application master
   process which is managed by YARN on the cluster, and the client can go away
   after initiating the application. This is invoked with –master yarn
and --deploy-mode
   cluster
   -

   *YARN Client Mode*, the driver runs in the client process, and the
   application master is only used for requesting resources from YARN.
Unlike Spark
   standalone mode, in which the master’s address is specified in the
   --master parameter, in YARN mode the ResourceManager’s address is picked
   up from the Hadoop configuration. Thus, the --master parameter is yarn. This
   is invoked with --deploy-mode client.

Yarn Cluster and Client Considerations

   -

   Client mode requires the process that launched the application remains
   alive. Meaning the host where it lives has to stay alive, and it may not be
   super-friendly to ssh sessions dying, for example, unless you use nohup.
   -

   Client mode driver logs are printed to stderr by default. Granted you
   can change that. In contrast, in cluster mode, they are all collected by
   yarn without any user intervention.
   -

   if your edge node (from where the app is launched) is not part of the
   cluster (e.g., lives in an outside network with firewalls or higher
   latency), you may run into issues.
   -

   In cluster mode, your driver's CPU and memory usage is accounted for in
   YARN. This matters if your edge node is part of the cluster (and could be
   running yarn containers), since in client mode your driver will potentially
   use a lot of CPU/Memory.
   -

   In cluster mode YARN can restart your application without user
   interference. This is useful for things that need to stay up (think a long
   running streaming job, for example).
   -

   If your client is not close to the cluster (e.g. your PC) then you
   definitely want to go cluster to improve performance.
   -

   If your client is close to the cluster (e.g. an edge node) then you
   could go either client or cluster.  Note that by going client, more
   resources are going to be used on the edge node.

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 28 July 2016 at 04:39, Yu Wei <yu...@hotmail.com> wrote:

> If cluster runs out of memory, it seems that the executor will be
> restarted by cluster manager.
>
>
> Jared, (韦煜）
> Software developer
> Interested in open source software, big data, Linux
> ------------------------------
> *From:* Ascot Moss <as...@gmail.com>
> *Sent:* Thursday, July 28, 2016 9:48:13 AM
> *To:* user @spark
> *Subject:* A question about Spark Cluster vs Local Mode
>
> Hi,
>
> If I submit the same job to spark in cluster mode, does it mean in cluster
> mode it will be run in cluster memory pool and it will fail if it runs out
> of cluster's memory?
>
> --driver-memory 64g \
>
> --executor-memory 16g \
>
> Regards
>

Re: A question about Spark Cluster vs Local Mode

Posted by Yu Wei <yu...@hotmail.com>.

If cluster runs out of memory, it seems that the executor will be restarted by cluster manager.


Jared, (韦煜）
Software developer
Interested in open source software, big data, Linux

________________________________
From: Ascot Moss <as...@gmail.com>
Sent: Thursday, July 28, 2016 9:48:13 AM
To: user @spark
Subject: A question about Spark Cluster vs Local Mode

Hi,

If I submit the same job to spark in cluster mode, does it mean in cluster mode it will be run in cluster memory pool and it will fail if it runs out of cluster's memory?


--driver-memory 64g \

--executor-memory 16g \

Regards

Re: A question about Spark Cluster vs Local Mode

Posted by Andy Davidson <An...@SantaCruzIntegration.com>.

Hi Ascot

When you run in cluster mode it means your cluster manager will cause your
driver to execute on one of the works in your cluster.

The advantage of this is you can log on to a machine in your cluster and
submit your application and then log out. The application will continue to
run.

Here is part of shell script I use to start a streaming app in cluster mode.
This app has been running for several months now

numCores=2 # must be at least 2 else steaming app will not get any data.
over all we are using 3 cores



# --executor-memory=1G # default is supposed to be 1G. If we did not set we
are seeing 6G

executorMem=1G



$SPARK_ROOT/bin/spark-submit \

--class "com.pws.sparkStreaming.collector.StreamingKafkaCollector" \

--master $MASTER_URL \

--deploy-mode cluster \

--total-executor-cores $numCores \

--executor-memory $executorMem \

$jarPath --clusterMode $*



From:  Ascot Moss <as...@gmail.com>
Date:  Wednesday, July 27, 2016 at 6:48 PM
To:  "user @spark" <us...@spark.apache.org>
Subject:  A question about Spark Cluster vs Local Mode

> Hi,
> 
> If I submit the same job to spark in cluster mode, does it mean in cluster
> mode it will be run in cluster memory pool and it will fail if it runs out of
> cluster's memory?
> 
> 
> --driver-memory 64g \
> 
> --executor-memory 16g \
> 
> Regards