You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Ashok Kumar <as...@yahoo.com.INVALID> on 2016/06/19 08:39:51 UTC

Running Spark in local mode

Hi,
I have been told Spark in Local mode is simplest for testing. Spark document covers little on local mode except the cores used in --master local[k]. 
Where are the the driver program, executor and resources. Do I need to start worker threads and how many app I can use safely without exceeding memory allocated etc?
Thanking you

Re: Running Spark in local mode

Posted by Ashok Kumar <as...@yahoo.com.INVALID>.

Thank you all sirs
Appreciated Mich your clarification.

    On Sunday, 19 June 2016, 19:31, Mich Talebzadeh <mi...@gmail.com> wrote:

 Thanks Jonathan for your points
I am aware of the fact yarn-client and yarn-cluster are both depreciated (still work in 1.6.1), hence the new nomenclature.
Bear in mind this is what I stated in my notes:
"YARN Cluster Mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. This is invoked with –master yarn and --deploy-mode cluster   
   - YARN Client Mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. 
   -    

   - Unlike Spark standalone mode, in which the master’s address is specified in the --master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration. Thus, the --master parameter is yarn. This is invoked with --deploy-mode client"

These are exactly from Spark document and I quote
"There are two deploy modes that can be used to launch Spark applications on YARN. In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. 
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Unlike Spark standalone and Mesos modes, in which the master’s address is specified in the --master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration. Thus, the --master parameter is yarn."
Cheers
Dr Mich Talebzadeh LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com 
On 19 June 2016 at 19:09, Jonathan Kelly <jo...@gmail.com> wrote:

Mich, what Jacek is saying is not that you implied that YARN relies on two masters. He's just clarifying that yarn-client and yarn-cluster modes are really both using the same (type of) master (simply "yarn"). In fact, if you specify "--master yarn-client" or "--master yarn-cluster", spark-submit will translate that into using a master URL of "yarn" and a deploy-mode of "client" or "cluster".

And thanks, Jacek, for the tips on the "less-common master URLs". I had no idea that was an option!

~ Jonathan
On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh <mi...@gmail.com> wrote:

Good points but I am an experimentalist 
In Local mode I have this
In local mode with:--master local This will start with one thread or equivalent to –master local[1]. Youcan also start by more than one thread by specifying the number of threads k in –master local[k]. You can also start using all available threads with –master local[*]which in mine would be local[12].
The important thing about Local mode is that number of JVM thrown is controlled by you and you can start as many spark-submit as you wish within constraint of what you get
${SPARK_HOME}/bin/spark-submit\                --packagescom.databricks:spark-csv_2.11:1.3.0 \                --driver-memory 2G \                --num-executors 1 \                --executor-memory 2G \                --master local \                --executor-cores 2 \                --conf"spark.scheduler.mode=FIFO" \                --conf"spark.executor.extraJavaOptions=-XX:+PrintGCDetails-XX:+PrintGCTimeStamps" \                --jars/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \                --class"${FILE_NAME}" \                --conf "spark.ui.port=4040” \                ${JAR_FILE} \                >> ${LOG_FILE}
Now that does work fine although some of those parameters are implicit (for example cheduler.mode = FIFOR or FAIR and I can start different spark jobs in Local mode. Great for testing.
With regard to your comments on Standalone 
Spark Standalone – a simple cluster manager included with Spark that makes it easy to set up a cluster.

s/simple/built-inWhat is stated as "included" implies that, i.e. it comes as part of running Spark in standalone mode. 
Your other points on YARN cluster mode and YARN client mode
I'd say there's only one YARN master, i.e. --master yarn. You could
 however say where the driver runs, be it on your local machine where
 you executed spark-submit or on one node in a YARN cluster.
Yes that is I believe what the text implied. I would be very surprised if YARN as a resource manager relies on two masters :)

HTH

Dr Mich Talebzadeh LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com 
On 19 June 2016 at 11:46, Jacek Laskowski <ja...@japila.pl> wrote:

On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
<mi...@gmail.com> wrote:

> Spark Local - Spark runs on the local host. This is the simplest set up and
> best suited for learners who want to understand different concepts of Spark
> and those performing unit testing.

There are also the less-common master URLs:

* local[n, maxRetries] or local[*, maxRetries] — local mode with n
threads and maxRetries number of failures.
* local-cluster[n, cores, memory] for simulating a Spark local cluster
with n workers, # cores per worker, and # memory per worker.

As of Spark 2.0.0, you could also have your own scheduling system -
see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
known implementation of the ExternalClusterManager contract in Spark
being YarnClusterManager, i.e. whenever you call Spark with --master
yarn.

> Spark Standalone – a simple cluster manager included with Spark that makes
> it easy to set up a cluster.

s/simple/built-in

> YARN Cluster Mode, the Spark driver runs inside an application master
> process which is managed by YARN on the cluster, and the client can go away
> after initiating the application. This is invoked with –master yarn and
> --deploy-mode cluster
>
> YARN Client Mode, the driver runs in the client process, and the application
> master is only used for requesting resources from YARN. Unlike Spark
> standalone mode, in which the master’s address is specified in the --master
> parameter, in YARN mode the ResourceManager’s address is picked up from the
> Hadoop configuration. Thus, the --master parameter is yarn. This is invoked
> with --deploy-mode client

I'd say there's only one YARN master, i.e. --master yarn. You could
however say where the driver runs, be it on your local machine where
you executed spark-submit or on one node in a YARN cluster.

The same applies to Spark Standalone and Mesos and is controlled by
--deploy-mode, i.e. client (default) or cluster.

Please update your notes accordingly ;-)

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

Re: Running Spark in local mode

Posted by Mich Talebzadeh <mi...@gmail.com>.

Thanks Jonathan for your points

I am aware of the fact yarn-client and yarn-cluster are both depreciated
(still work in 1.6.1), hence the new nomenclature.

Bear in mind this is what I stated in my notes:

"YARN Cluster Mode, the Spark driver runs inside an application master
process which is managed by YARN on the cluster, and the client can go away
after initiating the application. This is invoked with –master yarn
and --deploy-mode
cluster
-

YARN Client Mode, the driver runs in the client process, and the
application master is only used for requesting resources from YARN.
-


-

Unlike Spark standalone mode, in which the master’s address is specified in
the --master parameter, in YARN mode the ResourceManager’s address is
picked up from the Hadoop configuration. Thus, the --master parameter is
yarn. This is invoked with --deploy-mode client"

These are exactly from Spark document
<http://spark.apache.org/docs/latest/running-on-yarn.html>and I quote

"There are two deploy modes that can be used to launch Spark applications
on YARN. In cluster mode, the Spark driver runs inside an application
master process which is managed by YARN on the cluster, and the client can
go away after initiating the application.

In client mode, the driver runs in the client process, and the application
master is only used for requesting resources from YARN.

Unlike Spark standalone
<http://spark.apache.org/docs/latest/spark-standalone.html> and Mesos
<http://spark.apache.org/docs/latest/running-on-mesos.html> modes, in which
the master’s address is specified in the --master parameter, in YARN mode
the ResourceManager’s address is picked up from the Hadoop configuration.
Thus, the --master parameter is yarn."

Cheers

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 June 2016 at 19:09, Jonathan Kelly <jo...@gmail.com> wrote:

> Mich, what Jacek is saying is not that you implied that YARN relies on two
> masters. He's just clarifying that yarn-client and yarn-cluster modes are
> really both using the same (type of) master (simply "yarn"). In fact, if
> you specify "--master yarn-client" or "--master yarn-cluster", spark-submit
> will translate that into using a master URL of "yarn" and a deploy-mode of
> "client" or "cluster".
>
> And thanks, Jacek, for the tips on the "less-common master URLs". I had no
> idea that was an option!
>
> ~ Jonathan
>
> On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> Good points but I am an experimentalist
>>
>> In Local mode I have this
>>
>> In local mode with:
>>
>> --master local
>>
>>
>>
>> This will start with one thread or equivalent to –master local[1]. You
>> can also start by more than one thread by specifying the number of threads
>> *k* in –master local[k]. You can also start using all available threads
>> with –master local[*]which in mine would be local[12].
>>
>> The important thing about Local mode is that number of JVM thrown is
>> controlled by you and you can start as many spark-submit as you wish within
>> constraint of what you get
>>
>> ${SPARK_HOME}/bin/spark-submit \
>>
>>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>>
>>                 --driver-memory 2G \
>>
>>                 --num-executors 1 \
>>
>>                 --executor-memory 2G \
>>
>>                 --master local \
>>
>>                 --executor-cores 2 \
>>
>>                 --conf "spark.scheduler.mode=FIFO" \
>>
>>                 --conf
>> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps" \
>>
>>                 --jars
>> /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
>>
>>                 --class "${FILE_NAME}" \
>>
>>                 --conf "spark.ui.port=4040” \
>>
>>                 ${JAR_FILE} \
>>
>>                 >> ${LOG_FILE}
>>
>> Now that does work fine although some of those parameters are implicit
>> (for example cheduler.mode = FIFOR or FAIR and I can start different spark
>> jobs in Local mode. Great for testing.
>>
>> With regard to your comments on Standalone
>>
>> Spark Standalone – a simple cluster manager included with Spark that
>> makes it easy to set up a cluster.
>>
>> s/simple/built-in
>> What is stated as "included" implies that, i.e. it comes as part of
>> running Spark in standalone mode.
>>
>> Your other points on YARN cluster mode and YARN client mode
>>
>> I'd say there's only one YARN master, i.e. --master yarn. You could
>> however say where the driver runs, be it on your local machine where
>> you executed spark-submit or on one node in a YARN cluster.
>>
>>
>> Yes that is I believe what the text implied. I would be very surprised if
>> YARN as a resource manager relies on two masters :)
>>
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 19 June 2016 at 11:46, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
>>> <mi...@gmail.com> wrote:
>>>
>>> > Spark Local - Spark runs on the local host. This is the simplest set
>>> up and
>>> > best suited for learners who want to understand different concepts of
>>> Spark
>>> > and those performing unit testing.
>>>
>>> There are also the less-common master URLs:
>>>
>>> * local[n, maxRetries] or local[*, maxRetries] — local mode with n
>>> threads and maxRetries number of failures.
>>> * local-cluster[n, cores, memory] for simulating a Spark local cluster
>>> with n workers, # cores per worker, and # memory per worker.
>>>
>>> As of Spark 2.0.0, you could also have your own scheduling system -
>>> see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
>>> known implementation of the ExternalClusterManager contract in Spark
>>> being YarnClusterManager, i.e. whenever you call Spark with --master
>>> yarn.
>>>
>>> > Spark Standalone – a simple cluster manager included with Spark that
>>> makes
>>> > it easy to set up a cluster.
>>>
>>> s/simple/built-in
>>>
>>> > YARN Cluster Mode, the Spark driver runs inside an application master
>>> > process which is managed by YARN on the cluster, and the client can go
>>> away
>>> > after initiating the application. This is invoked with –master yarn and
>>> > --deploy-mode cluster
>>> >
>>> > YARN Client Mode, the driver runs in the client process, and the
>>> application
>>> > master is only used for requesting resources from YARN. Unlike Spark
>>> > standalone mode, in which the master’s address is specified in the
>>> --master
>>> > parameter, in YARN mode the ResourceManager’s address is picked up
>>> from the
>>> > Hadoop configuration. Thus, the --master parameter is yarn. This is
>>> invoked
>>> > with --deploy-mode client
>>>
>>> I'd say there's only one YARN master, i.e. --master yarn. You could
>>> however say where the driver runs, be it on your local machine where
>>> you executed spark-submit or on one node in a YARN cluster.
>>>
>>> The same applies to Spark Standalone and Mesos and is controlled by
>>> --deploy-mode, i.e. client (default) or cluster.
>>>
>>> Please update your notes accordingly ;-)
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>
>>

Re: Running Spark in local mode

Posted by Jonathan Kelly <jo...@gmail.com>.

Mich, what Jacek is saying is not that you implied that YARN relies on two
masters. He's just clarifying that yarn-client and yarn-cluster modes are
really both using the same (type of) master (simply "yarn"). In fact, if
you specify "--master yarn-client" or "--master yarn-cluster", spark-submit
will translate that into using a master URL of "yarn" and a deploy-mode of
"client" or "cluster".

And thanks, Jacek, for the tips on the "less-common master URLs". I had no
idea that was an option!

~ Jonathan

On Sun, Jun 19, 2016 at 4:13 AM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Good points but I am an experimentalist
>
> In Local mode I have this
>
> In local mode with:
>
> --master local
>
>
>
> This will start with one thread or equivalent to –master local[1]. You can
> also start by more than one thread by specifying the number of threads *k*
> in –master local[k]. You can also start using all available threads with –master
> local[*]which in mine would be local[12].
>
> The important thing about Local mode is that number of JVM thrown is
> controlled by you and you can start as many spark-submit as you wish within
> constraint of what you get
>
> ${SPARK_HOME}/bin/spark-submit \
>
>                 --packages com.databricks:spark-csv_2.11:1.3.0 \
>
>                 --driver-memory 2G \
>
>                 --num-executors 1 \
>
>                 --executor-memory 2G \
>
>                 --master local \
>
>                 --executor-cores 2 \
>
>                 --conf "spark.scheduler.mode=FIFO" \
>
>                 --conf
> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps" \
>
>                 --jars
> /home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \
>
>                 --class "${FILE_NAME}" \
>
>                 --conf "spark.ui.port=4040” \
>
>                 ${JAR_FILE} \
>
>                 >> ${LOG_FILE}
>
> Now that does work fine although some of those parameters are implicit
> (for example cheduler.mode = FIFOR or FAIR and I can start different spark
> jobs in Local mode. Great for testing.
>
> With regard to your comments on Standalone
>
> Spark Standalone – a simple cluster manager included with Spark that
> makes it easy to set up a cluster.
>
> s/simple/built-in
> What is stated as "included" implies that, i.e. it comes as part of
> running Spark in standalone mode.
>
> Your other points on YARN cluster mode and YARN client mode
>
> I'd say there's only one YARN master, i.e. --master yarn. You could
> however say where the driver runs, be it on your local machine where
> you executed spark-submit or on one node in a YARN cluster.
>
>
> Yes that is I believe what the text implied. I would be very surprised if
> YARN as a resource manager relies on two masters :)
>
>
> HTH
>
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 June 2016 at 11:46, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
>> <mi...@gmail.com> wrote:
>>
>> > Spark Local - Spark runs on the local host. This is the simplest set up
>> and
>> > best suited for learners who want to understand different concepts of
>> Spark
>> > and those performing unit testing.
>>
>> There are also the less-common master URLs:
>>
>> * local[n, maxRetries] or local[*, maxRetries] — local mode with n
>> threads and maxRetries number of failures.
>> * local-cluster[n, cores, memory] for simulating a Spark local cluster
>> with n workers, # cores per worker, and # memory per worker.
>>
>> As of Spark 2.0.0, you could also have your own scheduling system -
>> see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
>> known implementation of the ExternalClusterManager contract in Spark
>> being YarnClusterManager, i.e. whenever you call Spark with --master
>> yarn.
>>
>> > Spark Standalone – a simple cluster manager included with Spark that
>> makes
>> > it easy to set up a cluster.
>>
>> s/simple/built-in
>>
>> > YARN Cluster Mode, the Spark driver runs inside an application master
>> > process which is managed by YARN on the cluster, and the client can go
>> away
>> > after initiating the application. This is invoked with –master yarn and
>> > --deploy-mode cluster
>> >
>> > YARN Client Mode, the driver runs in the client process, and the
>> application
>> > master is only used for requesting resources from YARN. Unlike Spark
>> > standalone mode, in which the master’s address is specified in the
>> --master
>> > parameter, in YARN mode the ResourceManager’s address is picked up from
>> the
>> > Hadoop configuration. Thus, the --master parameter is yarn. This is
>> invoked
>> > with --deploy-mode client
>>
>> I'd say there's only one YARN master, i.e. --master yarn. You could
>> however say where the driver runs, be it on your local machine where
>> you executed spark-submit or on one node in a YARN cluster.
>>
>> The same applies to Spark Standalone and Mesos and is controlled by
>> --deploy-mode, i.e. client (default) or cluster.
>>
>> Please update your notes accordingly ;-)
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>
>

Re: Running Spark in local mode

Posted by Mich Talebzadeh <mi...@gmail.com>.

Good points but I am an experimentalist

In Local mode I have this

In local mode with:

--master local



This will start with one thread or equivalent to –master local[1]. You can
also start by more than one thread by specifying the number of threads *k*
in –master local[k]. You can also start using all available threads
with –master
local[*]which in mine would be local[12].

The important thing about Local mode is that number of JVM thrown is
controlled by you and you can start as many spark-submit as you wish within
constraint of what you get

${SPARK_HOME}/bin/spark-submit \

                --packages com.databricks:spark-csv_2.11:1.3.0 \

                --driver-memory 2G \

                --num-executors 1 \

                --executor-memory 2G \

                --master local \

                --executor-cores 2 \

                --conf "spark.scheduler.mode=FIFO" \

                --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps" \

                --jars
/home/hduser/jars/spark-streaming-kafka-assembly_2.10-1.6.1.jar \

                --class "${FILE_NAME}" \

                --conf "spark.ui.port=4040” \

                ${JAR_FILE} \

                >> ${LOG_FILE}

Now that does work fine although some of those parameters are implicit (for
example cheduler.mode = FIFOR or FAIR and I can start different spark jobs
in Local mode. Great for testing.

With regard to your comments on Standalone

Spark Standalone – a simple cluster manager included with Spark that
makes it easy to set up a cluster.

s/simple/built-in
What is stated as "included" implies that, i.e. it comes as part of running
Spark in standalone mode.

Your other points on YARN cluster mode and YARN client mode

I'd say there's only one YARN master, i.e. --master yarn. You could
however say where the driver runs, be it on your local machine where
you executed spark-submit or on one node in a YARN cluster.


Yes that is I believe what the text implied. I would be very surprised if
YARN as a resource manager relies on two masters :)


HTH









Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 June 2016 at 11:46, Jacek Laskowski <ja...@japila.pl> wrote:

> On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
> <mi...@gmail.com> wrote:
>
> > Spark Local - Spark runs on the local host. This is the simplest set up
> and
> > best suited for learners who want to understand different concepts of
> Spark
> > and those performing unit testing.
>
> There are also the less-common master URLs:
>
> * local[n, maxRetries] or local[*, maxRetries] — local mode with n
> threads and maxRetries number of failures.
> * local-cluster[n, cores, memory] for simulating a Spark local cluster
> with n workers, # cores per worker, and # memory per worker.
>
> As of Spark 2.0.0, you could also have your own scheduling system -
> see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
> known implementation of the ExternalClusterManager contract in Spark
> being YarnClusterManager, i.e. whenever you call Spark with --master
> yarn.
>
> > Spark Standalone – a simple cluster manager included with Spark that
> makes
> > it easy to set up a cluster.
>
> s/simple/built-in
>
> > YARN Cluster Mode, the Spark driver runs inside an application master
> > process which is managed by YARN on the cluster, and the client can go
> away
> > after initiating the application. This is invoked with –master yarn and
> > --deploy-mode cluster
> >
> > YARN Client Mode, the driver runs in the client process, and the
> application
> > master is only used for requesting resources from YARN. Unlike Spark
> > standalone mode, in which the master’s address is specified in the
> --master
> > parameter, in YARN mode the ResourceManager’s address is picked up from
> the
> > Hadoop configuration. Thus, the --master parameter is yarn. This is
> invoked
> > with --deploy-mode client
>
> I'd say there's only one YARN master, i.e. --master yarn. You could
> however say where the driver runs, be it on your local machine where
> you executed spark-submit or on one node in a YARN cluster.
>
> The same applies to Spark Standalone and Mesos and is controlled by
> --deploy-mode, i.e. client (default) or cluster.
>
> Please update your notes accordingly ;-)
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>

Re: Running Spark in local mode

Posted by Jacek Laskowski <ja...@japila.pl>.

On Sun, Jun 19, 2016 at 12:30 PM, Mich Talebzadeh
<mi...@gmail.com> wrote:

> Spark Local - Spark runs on the local host. This is the simplest set up and
> best suited for learners who want to understand different concepts of Spark
> and those performing unit testing.

There are also the less-common master URLs:

* local[n, maxRetries] or local[*, maxRetries] — local mode with n
threads and maxRetries number of failures.
* local-cluster[n, cores, memory] for simulating a Spark local cluster
with n workers, # cores per worker, and # memory per worker.

As of Spark 2.0.0, you could also have your own scheduling system -
see https://issues.apache.org/jira/browse/SPARK-13904 - with the only
known implementation of the ExternalClusterManager contract in Spark
being YarnClusterManager, i.e. whenever you call Spark with --master
yarn.

> Spark Standalone – a simple cluster manager included with Spark that makes
> it easy to set up a cluster.

s/simple/built-in

> YARN Cluster Mode, the Spark driver runs inside an application master
> process which is managed by YARN on the cluster, and the client can go away
> after initiating the application. This is invoked with –master yarn and
> --deploy-mode cluster
>
> YARN Client Mode, the driver runs in the client process, and the application
> master is only used for requesting resources from YARN. Unlike Spark
> standalone mode, in which the master’s address is specified in the --master
> parameter, in YARN mode the ResourceManager’s address is picked up from the
> Hadoop configuration. Thus, the --master parameter is yarn. This is invoked
> with --deploy-mode client

I'd say there's only one YARN master, i.e. --master yarn. You could
however say where the driver runs, be it on your local machine where
you executed spark-submit or on one node in a YARN cluster.

The same applies to Spark Standalone and Mesos and is controlled by
--deploy-mode, i.e. client (default) or cluster.

Please update your notes accordingly ;-)

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Running Spark in local mode

Posted by Mich Talebzadeh <mi...@gmail.com>.

Spark works on different modes, either local (Spark or anything else does
not manager) resources and  standalone (Spark itself manages resources)
plus others (see below)

These are from my notes, excluding mesos that I have not used


   - Spark Local - Spark runs on the local host. This is the simplest set
   up and best suited for learners who want to understand different concepts
   of Spark and those performing unit testing.
   -

   Spark Standalone – a simple cluster manager included with Spark that
   makes it easy to set up a cluster.
   -

   YARN Cluster Mode, the Spark driver runs inside an application master
   process which is managed by YARN on the cluster, and the client can go away
   after initiating the application. This is invoked with –master yarn
and --deploy-mode
   cluster
   -

   YARN Client Mode, the driver runs in the client process, and the
   application master is only used for requesting resources from YARN.
Unlike Spark
   standalone mode, in which the master’s address is specified in the
   --master parameter, in YARN mode the ResourceManager’s address is picked
   up from the Hadoop configuration. Thus, the --master parameter is yarn. This
   is invoked with --deploy-mode client

 So in Local mode  is the simplest configuration of Spark that does not
require a Cluster. The user on the local host can launch and experiment
with Spark. In this mode the driver program (SparkSubmit), the resource
manager and executor all exist within the same JVM. The JVM itself is the
worker thread. In Local mode, you do not need to start master and
slaves/workers. In this mode it is pretty simple and you can run as many
JVMs (spark-submit) as your resources allow (resource meaning memory and
cores).

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 June 2016 at 10:39, Takeshi Yamamuro <li...@gmail.com> wrote:

> There are many technical differences inside though, how to use is the
> almost same with each other.
> yea, in a standalone mode, spark runs in a cluster way: see
> http://spark.apache.org/docs/1.6.1/cluster-overview.html
>
> // maropu
>
> On Sun, Jun 19, 2016 at 6:14 PM, Ashok Kumar <as...@yahoo.com> wrote:
>
>> thank you
>>
>> What are the main differences between a local mode and standalone mode. I
>> understand local mode does not support cluster. Is that the only difference?
>>
>>
>>
>> On Sunday, 19 June 2016, 9:52, Takeshi Yamamuro <li...@gmail.com>
>> wrote:
>>
>>
>> Hi,
>>
>> In a local mode, spark runs in a single JVM that has a master and one
>> executor with `k` threads.
>>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94
>>
>> // maropu
>>
>>
>> On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar <
>> ashok34668@yahoo.com.invalid> wrote:
>>
>> Hi,
>>
>> I have been told Spark in Local mode is simplest for testing. Spark
>> document covers little on local mode except the cores used in --master
>> local[k].
>>
>> Where are the the driver program, executor and resources. Do I need to
>> start worker threads and how many app I can use safely without exceeding
>> memory allocated etc?
>>
>> Thanking you
>>
>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Running Spark in local mode

Posted by Takeshi Yamamuro <li...@gmail.com>.

There are many technical differences inside though, how to use is the
almost same with each other.
yea, in a standalone mode, spark runs in a cluster way: see
http://spark.apache.org/docs/1.6.1/cluster-overview.html

// maropu

On Sun, Jun 19, 2016 at 6:14 PM, Ashok Kumar <as...@yahoo.com> wrote:

> thank you
>
> What are the main differences between a local mode and standalone mode. I
> understand local mode does not support cluster. Is that the only difference?
>
>
>
> On Sunday, 19 June 2016, 9:52, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>
> Hi,
>
> In a local mode, spark runs in a single JVM that has a master and one
> executor with `k` threads.
>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94
>
> // maropu
>
>
> On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar <ashok34668@yahoo.com.invalid
> > wrote:
>
> Hi,
>
> I have been told Spark in Local mode is simplest for testing. Spark
> document covers little on local mode except the cores used in --master
> local[k].
>
> Where are the the driver program, executor and resources. Do I need to
> start worker threads and how many app I can use safely without exceeding
> memory allocated etc?
>
> Thanking you
>
>
>
>
>
> --
> ---
> Takeshi Yamamuro
>
>
>


-- 
---
Takeshi Yamamuro

Re: Running Spark in local mode

Posted by Ashok Kumar <as...@yahoo.com.INVALID>.

thank you 
What are the main differences between a local mode and standalone mode. I understand local mode does not support cluster. Is that the only difference?
 

    On Sunday, 19 June 2016, 9:52, Takeshi Yamamuro <li...@gmail.com> wrote:
 

 Hi,
In a local mode, spark runs in a single JVM that has a master and one executor with `k` threads.https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94

// maropu

On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar <as...@yahoo.com.invalid> wrote:

Hi,
I have been told Spark in Local mode is simplest for testing. Spark document covers little on local mode except the cores used in --master local[k]. 
Where are the the driver program, executor and resources. Do I need to start worker threads and how many app I can use safely without exceeding memory allocated etc?
Thanking you





-- 
---
Takeshi Yamamuro

Re: Running Spark in local mode

Posted by Takeshi Yamamuro <li...@gmail.com>.

Hi,

In a local mode, spark runs in a single JVM that has a master and one
executor with `k` threads.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/local/LocalSchedulerBackend.scala#L94

// maropu

On Sun, Jun 19, 2016 at 5:39 PM, Ashok Kumar <as...@yahoo.com.invalid>
wrote:

> Hi,
>
> I have been told Spark in Local mode is simplest for testing. Spark
> document covers little on local mode except the cores used in --master
> local[k].
>
> Where are the the driver program, executor and resources. Do I need to
> start worker threads and how many app I can use safely without exceeding
> memory allocated etc?
>
> Thanking you
>
>
>

-- 
---
Takeshi Yamamuro