You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Saiph Kappa <sa...@gmail.com> on 2016/06/05 23:54:39 UTC

Specify node where driver should run

Hi,

In yarn-cluster mode, is there any way to specify on which node I want the
driver to run?

Thanks.

Re: Specify node where driver should run

Posted by Mich Talebzadeh <mi...@gmail.com>.
by default the driver will start where you have started
sbin/start-master.sh. that is where you start you app SparkSubmit.

The slaves have to have an entry in slaves file

What is the issue here?




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 6 June 2016 at 18:59, Bryan Cutler <cu...@gmail.com> wrote:

> I'm not an expert on YARN so anyone please correct me if I'm wrong, but I
> believe the Resource Manager will schedule the application to be run on the
> AM of any node that has a Node Manager, depending on available resources.
> So you would normally query the RM via the REST API to determine that.  You
> can restrict which nodes get scheduled using this propery spark.yarn.am.nodeLabelExpression.
> See here for details
> http://spark.apache.org/docs/latest/running-on-yarn.html
>
> On Mon, Jun 6, 2016 at 9:04 AM, Saiph Kappa <sa...@gmail.com> wrote:
>
>> How can I specify the node where application master should run in the
>> yarn conf? I haven't found any useful information regarding that.
>>
>> Thanks.
>>
>> On Mon, Jun 6, 2016 at 4:52 PM, Bryan Cutler <cu...@gmail.com> wrote:
>>
>>> In that mode, it will run on the application master, whichever node that
>>> is as specified in your yarn conf.
>>> On Jun 5, 2016 4:54 PM, "Saiph Kappa" <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> In yarn-cluster mode, is there any way to specify on which node I want
>>>> the driver to run?
>>>>
>>>> Thanks.
>>>>
>>>
>>
>

Re: Specify node where driver should run

Posted by Bryan Cutler <cu...@gmail.com>.
I'm not an expert on YARN so anyone please correct me if I'm wrong, but I
believe the Resource Manager will schedule the application to be run on the
AM of any node that has a Node Manager, depending on available resources.
So you would normally query the RM via the REST API to determine that.  You
can restrict which nodes get scheduled using this propery
spark.yarn.am.nodeLabelExpression.
See here for details
http://spark.apache.org/docs/latest/running-on-yarn.html

On Mon, Jun 6, 2016 at 9:04 AM, Saiph Kappa <sa...@gmail.com> wrote:

> How can I specify the node where application master should run in the yarn
> conf? I haven't found any useful information regarding that.
>
> Thanks.
>
> On Mon, Jun 6, 2016 at 4:52 PM, Bryan Cutler <cu...@gmail.com> wrote:
>
>> In that mode, it will run on the application master, whichever node that
>> is as specified in your yarn conf.
>> On Jun 5, 2016 4:54 PM, "Saiph Kappa" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> In yarn-cluster mode, is there any way to specify on which node I want
>>> the driver to run?
>>>
>>> Thanks.
>>>
>>
>

Re: Specify node where driver should run

Posted by Saiph Kappa <sa...@gmail.com>.
How can I specify the node where application master should run in the yarn
conf? I haven't found any useful information regarding that.

Thanks.

On Mon, Jun 6, 2016 at 4:52 PM, Bryan Cutler <cu...@gmail.com> wrote:

> In that mode, it will run on the application master, whichever node that
> is as specified in your yarn conf.
> On Jun 5, 2016 4:54 PM, "Saiph Kappa" <sa...@gmail.com> wrote:
>
>> Hi,
>>
>> In yarn-cluster mode, is there any way to specify on which node I want
>> the driver to run?
>>
>> Thanks.
>>
>

Re: Specify node where driver should run

Posted by Bryan Cutler <cu...@gmail.com>.
In that mode, it will run on the application master, whichever node that is
as specified in your yarn conf.
On Jun 5, 2016 4:54 PM, "Saiph Kappa" <sa...@gmail.com> wrote:

> Hi,
>
> In yarn-cluster mode, is there any way to specify on which node I want the
> driver to run?
>
> Thanks.
>

Re: Specify node where driver should run

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks. This is getting a bit confusing.

I have these modes for using Spark.


   1. Spark local All on the same host -->  -master local[n]l.. No need to
   start master and slaves. Uses resources as you submit the job.
   2. Spark Standalone. Use a simple cluster manager included with Spark
   that makes it easy to set up a cluster -->  --master
   spark://<HOSTNAME>:7077. Can run on different hosts. Does not rely on
   Yarn. It looks after scheduling itself. Need to start master and slaves


The doc says: There are two deploy modes that can be used to launch Spark
applications* on YARN*.

*In cluster mode*, the Spark driver runs inside an application master
process which is managed by YARN on the cluster, and the client can go away
after initiating the application.

*In client mode*, the driver runs in the client process, and the
application master is only used for requesting resources from YARN.

Unlike Spark standalone
<http://spark.apache.org/docs/latest/spark-standalone.html> and Mesos
<http://spark.apache.org/docs/latest/running-on-mesos.html> modes, in which
the master’s address is specified in the --master parameter, in YARN mode
the ResourceManager’s address is picked up from the Hadoop configuration.
Thus, the --master parameter is yarn.
  So either we have -->   --master yarn --deploy-mode cluster

OR
                                 -->    master yarn-client

So I am not sure running Spark with Yarn in either yarn-client or yarn
cluster is going to make much difference. In sounds like yarn-cluster
supercedes yarn-client?


Any comments welcome




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 15:40, Sebastian Piu <se...@gmail.com> wrote:

> If you run that job then the driver will ALWAYS run in the machine from
> where you are issuing the spark submit command (E.g. some edge node with
> the clients installed). No matter where the resource manager is running.
>
> If you change yarn-client for yarn-cluster then your driver will start
> somewhere else in the cluster as will the workers and the spark submit
> command will return before the program finishes
>
> On Tue, 7 Jun 2016, 14:53 Jacek Laskowski, <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> --master yarn-client is deprecated and you should use --master yarn
>> --deploy-mode client instead. There are two deploy-modes: client
>> (default) and cluster. See
>> http://spark.apache.org/docs/latest/cluster-overview.html.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jun 7, 2016 at 2:50 PM, Mich Talebzadeh
>> <mi...@gmail.com> wrote:
>> > ok thanks
>> >
>> > so I start SparkSubmit or similar Spark app on the Yarn resource manager
>> > node.
>> >
>> > What you are stating is that Yan may decide to start the driver program
>> in
>> > another node as opposed to the resource manager node
>> >
>> > ${SPARK_HOME}/bin/spark-submit \
>> >                 --driver-memory=4G \
>> >                 --num-executors=5 \
>> >                 --executor-memory=4G \
>> >                 --master yarn-client \
>> >                 --executor-cores=4 \
>> >
>> > Due to lack of resources in the resource manager node? What is the
>> > likelihood of that. The resource manager node is the defector master
>> node in
>> > all probability much more powerful than other nodes. Also the node that
>> > running resource manager is also running one of the node manager as
>> well. So
>> > in theory may be in practice may not?
>> >
>> > HTH
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> >
>> >
>> > On 7 June 2016 at 13:20, Sebastian Piu <se...@gmail.com> wrote:
>> >>
>> >> What you are explaining is right for yarn-client mode, but the
>> question is
>> >> about yarn-cluster in which case the spark driver is also submitted
>> and run
>> >> in one of the node managers
>> >>
>> >>
>> >> On Tue, 7 Jun 2016, 13:45 Mich Talebzadeh, <mi...@gmail.com>
>> >> wrote:
>> >>>
>> >>> can you elaborate on the above statement please.
>> >>>
>> >>> When you start yarn you start the resource manager daemon only on the
>> >>> resource manager node
>> >>>
>> >>> yarn-daemon.sh start resourcemanager
>> >>>
>> >>> Then you start nodemanager deamons on all nodes
>> >>>
>> >>> yarn-daemon.sh start nodemanager
>> >>>
>> >>> A spark app has to start somewhere. That is SparkSubmit. and that is
>> >>> deterministic. I start SparkSubmit that talks to Yarn Resource
>> Manager that
>> >>> initialises and registers an Application master. The crucial point is
>> Yarn
>> >>> Resource manager which is basically a resource scheduler. It
>> optimizes for
>> >>> cluster resource utilization to keep all resources in use all the
>> time.
>> >>> However, resource manager itself is on the resource manager node.
>> >>>
>> >>> Now I always start my Spark app on the same node as the resource
>> manager
>> >>> node and let Yarn take care of the rest.
>> >>>
>> >>> Thanks
>> >>>
>> >>> Dr Mich Talebzadeh
>> >>>
>> >>>
>> >>>
>> >>> LinkedIn
>> >>>
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >>>
>> >>>
>> >>>
>> >>> http://talebzadehmich.wordpress.com
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 7 June 2016 at 12:17, Jacek Laskowski <ja...@japila.pl> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> It's not possible. YARN uses CPU and memory for resource constraints
>> and
>> >>>> places AM on any node available. Same about executors (unless data
>> locality
>> >>>> constraints the placement).
>> >>>>
>> >>>> Jacek
>> >>>>
>> >>>> On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <sa...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> In yarn-cluster mode, is there any way to specify on which node I
>> want
>> >>>>> the driver to run?
>> >>>>>
>> >>>>> Thanks.
>> >>>
>> >>>
>> >
>>
>

Re: Specify node where driver should run

Posted by Sebastian Piu <se...@gmail.com>.
If you run that job then the driver will ALWAYS run in the machine from
where you are issuing the spark submit command (E.g. some edge node with
the clients installed). No matter where the resource manager is running.

If you change yarn-client for yarn-cluster then your driver will start
somewhere else in the cluster as will the workers and the spark submit
command will return before the program finishes

On Tue, 7 Jun 2016, 14:53 Jacek Laskowski, <ja...@japila.pl> wrote:

> Hi,
>
> --master yarn-client is deprecated and you should use --master yarn
> --deploy-mode client instead. There are two deploy-modes: client
> (default) and cluster. See
> http://spark.apache.org/docs/latest/cluster-overview.html.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Tue, Jun 7, 2016 at 2:50 PM, Mich Talebzadeh
> <mi...@gmail.com> wrote:
> > ok thanks
> >
> > so I start SparkSubmit or similar Spark app on the Yarn resource manager
> > node.
> >
> > What you are stating is that Yan may decide to start the driver program
> in
> > another node as opposed to the resource manager node
> >
> > ${SPARK_HOME}/bin/spark-submit \
> >                 --driver-memory=4G \
> >                 --num-executors=5 \
> >                 --executor-memory=4G \
> >                 --master yarn-client \
> >                 --executor-cores=4 \
> >
> > Due to lack of resources in the resource manager node? What is the
> > likelihood of that. The resource manager node is the defector master
> node in
> > all probability much more powerful than other nodes. Also the node that
> > running resource manager is also running one of the node manager as
> well. So
> > in theory may be in practice may not?
> >
> > HTH
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> >
> >
> > On 7 June 2016 at 13:20, Sebastian Piu <se...@gmail.com> wrote:
> >>
> >> What you are explaining is right for yarn-client mode, but the question
> is
> >> about yarn-cluster in which case the spark driver is also submitted and
> run
> >> in one of the node managers
> >>
> >>
> >> On Tue, 7 Jun 2016, 13:45 Mich Talebzadeh, <mi...@gmail.com>
> >> wrote:
> >>>
> >>> can you elaborate on the above statement please.
> >>>
> >>> When you start yarn you start the resource manager daemon only on the
> >>> resource manager node
> >>>
> >>> yarn-daemon.sh start resourcemanager
> >>>
> >>> Then you start nodemanager deamons on all nodes
> >>>
> >>> yarn-daemon.sh start nodemanager
> >>>
> >>> A spark app has to start somewhere. That is SparkSubmit. and that is
> >>> deterministic. I start SparkSubmit that talks to Yarn Resource Manager
> that
> >>> initialises and registers an Application master. The crucial point is
> Yarn
> >>> Resource manager which is basically a resource scheduler. It optimizes
> for
> >>> cluster resource utilization to keep all resources in use all the time.
> >>> However, resource manager itself is on the resource manager node.
> >>>
> >>> Now I always start my Spark app on the same node as the resource
> manager
> >>> node and let Yarn take care of the rest.
> >>>
> >>> Thanks
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>>
> >>>
> >>> LinkedIn
> >>>
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>>
> >>>
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>>
> >>>
> >>>
> >>> On 7 June 2016 at 12:17, Jacek Laskowski <ja...@japila.pl> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> It's not possible. YARN uses CPU and memory for resource constraints
> and
> >>>> places AM on any node available. Same about executors (unless data
> locality
> >>>> constraints the placement).
> >>>>
> >>>> Jacek
> >>>>
> >>>> On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <sa...@gmail.com> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> In yarn-cluster mode, is there any way to specify on which node I
> want
> >>>>> the driver to run?
> >>>>>
> >>>>> Thanks.
> >>>
> >>>
> >
>

Re: Specify node where driver should run

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

--master yarn-client is deprecated and you should use --master yarn
--deploy-mode client instead. There are two deploy-modes: client
(default) and cluster. See
http://spark.apache.org/docs/latest/cluster-overview.html.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Jun 7, 2016 at 2:50 PM, Mich Talebzadeh
<mi...@gmail.com> wrote:
> ok thanks
>
> so I start SparkSubmit or similar Spark app on the Yarn resource manager
> node.
>
> What you are stating is that Yan may decide to start the driver program in
> another node as opposed to the resource manager node
>
> ${SPARK_HOME}/bin/spark-submit \
>                 --driver-memory=4G \
>                 --num-executors=5 \
>                 --executor-memory=4G \
>                 --master yarn-client \
>                 --executor-cores=4 \
>
> Due to lack of resources in the resource manager node? What is the
> likelihood of that. The resource manager node is the defector master node in
> all probability much more powerful than other nodes. Also the node that
> running resource manager is also running one of the node manager as well. So
> in theory may be in practice may not?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> On 7 June 2016 at 13:20, Sebastian Piu <se...@gmail.com> wrote:
>>
>> What you are explaining is right for yarn-client mode, but the question is
>> about yarn-cluster in which case the spark driver is also submitted and run
>> in one of the node managers
>>
>>
>> On Tue, 7 Jun 2016, 13:45 Mich Talebzadeh, <mi...@gmail.com>
>> wrote:
>>>
>>> can you elaborate on the above statement please.
>>>
>>> When you start yarn you start the resource manager daemon only on the
>>> resource manager node
>>>
>>> yarn-daemon.sh start resourcemanager
>>>
>>> Then you start nodemanager deamons on all nodes
>>>
>>> yarn-daemon.sh start nodemanager
>>>
>>> A spark app has to start somewhere. That is SparkSubmit. and that is
>>> deterministic. I start SparkSubmit that talks to Yarn Resource Manager that
>>> initialises and registers an Application master. The crucial point is Yarn
>>> Resource manager which is basically a resource scheduler. It optimizes for
>>> cluster resource utilization to keep all resources in use all the time.
>>> However, resource manager itself is on the resource manager node.
>>>
>>> Now I always start my Spark app on the same node as the resource manager
>>> node and let Yarn take care of the rest.
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>>
>>> On 7 June 2016 at 12:17, Jacek Laskowski <ja...@japila.pl> wrote:
>>>>
>>>> Hi,
>>>>
>>>> It's not possible. YARN uses CPU and memory for resource constraints and
>>>> places AM on any node available. Same about executors (unless data locality
>>>> constraints the placement).
>>>>
>>>> Jacek
>>>>
>>>> On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <sa...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> In yarn-cluster mode, is there any way to specify on which node I want
>>>>> the driver to run?
>>>>>
>>>>> Thanks.
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Specify node where driver should run

Posted by Mich Talebzadeh <mi...@gmail.com>.
ok thanks

so I start SparkSubmit or similar Spark app on the Yarn resource manager
node.

What you are stating is that Yan may decide to start the driver program in
another node as opposed to the resource manager node

${SPARK_HOME}/bin/spark-submit \
                --driver-memory=4G \
                --num-executors=5 \
                --executor-memory=4G \
                --master yarn-client \
                --executor-cores=4 \

Due to lack of resources in the resource manager node? What is the
likelihood of that. The resource manager node is the defector master node
in all probability much more powerful than other nodes. Also the node that
running resource manager is also running one of the node manager as well.
So in theory may be in practice may not?

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 13:20, Sebastian Piu <se...@gmail.com> wrote:

> What you are explaining is right for yarn-client mode, but the question is
> about yarn-cluster in which case the spark driver is also submitted and run
> in one of the node managers
>
>
> On Tue, 7 Jun 2016, 13:45 Mich Talebzadeh, <mi...@gmail.com>
> wrote:
>
>> can you elaborate on the above statement please.
>>
>> When you start yarn you start the resource manager daemon only on the
>> resource manager node
>>
>> yarn-daemon.sh start resourcemanager
>>
>> Then you start nodemanager deamons on all nodes
>>
>> yarn-daemon.sh start nodemanager
>>
>> A spark app has to start somewhere. That is SparkSubmit. and that is
>> deterministic. I start SparkSubmit that talks to Yarn Resource Manager that
>> initialises and registers an Application master. The crucial point is Yarn
>> Resource manager which is basically a resource scheduler. It optimizes for
>> cluster resource utilization to keep all resources in use all the time.
>> However, resource manager itself is on the resource manager node.
>>
>> Now I always start my Spark app on the same node as the resource manager
>> node and let Yarn take care of the rest.
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 7 June 2016 at 12:17, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> Hi,
>>>
>>> It's not possible. YARN uses CPU and memory for resource constraints and
>>> places AM on any node available. Same about executors (unless data locality
>>> constraints the placement).
>>>
>>> Jacek
>>> On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> In yarn-cluster mode, is there any way to specify on which node I want
>>>> the driver to run?
>>>>
>>>> Thanks.
>>>>
>>>
>>

Re: Specify node where driver should run

Posted by Sebastian Piu <se...@gmail.com>.
What you are explaining is right for yarn-client mode, but the question is
about yarn-cluster in which case the spark driver is also submitted and run
in one of the node managers

On Tue, 7 Jun 2016, 13:45 Mich Talebzadeh, <mi...@gmail.com>
wrote:

> can you elaborate on the above statement please.
>
> When you start yarn you start the resource manager daemon only on the
> resource manager node
>
> yarn-daemon.sh start resourcemanager
>
> Then you start nodemanager deamons on all nodes
>
> yarn-daemon.sh start nodemanager
>
> A spark app has to start somewhere. That is SparkSubmit. and that is
> deterministic. I start SparkSubmit that talks to Yarn Resource Manager that
> initialises and registers an Application master. The crucial point is Yarn
> Resource manager which is basically a resource scheduler. It optimizes for
> cluster resource utilization to keep all resources in use all the time.
> However, resource manager itself is on the resource manager node.
>
> Now I always start my Spark app on the same node as the resource manager
> node and let Yarn take care of the rest.
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 7 June 2016 at 12:17, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> It's not possible. YARN uses CPU and memory for resource constraints and
>> places AM on any node available. Same about executors (unless data locality
>> constraints the placement).
>>
>> Jacek
>> On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> In yarn-cluster mode, is there any way to specify on which node I want
>>> the driver to run?
>>>
>>> Thanks.
>>>
>>
>

Re: Specify node where driver should run

Posted by Mich Talebzadeh <mi...@gmail.com>.
can you elaborate on the above statement please.

When you start yarn you start the resource manager daemon only on the
resource manager node

yarn-daemon.sh start resourcemanager

Then you start nodemanager deamons on all nodes

yarn-daemon.sh start nodemanager

A spark app has to start somewhere. That is SparkSubmit. and that is
deterministic. I start SparkSubmit that talks to Yarn Resource Manager that
initialises and registers an Application master. The crucial point is Yarn
Resource manager which is basically a resource scheduler. It optimizes for
cluster resource utilization to keep all resources in use all the time.
However, resource manager itself is on the resource manager node.

Now I always start my Spark app on the same node as the resource manager
node and let Yarn take care of the rest.

Thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 12:17, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> It's not possible. YARN uses CPU and memory for resource constraints and
> places AM on any node available. Same about executors (unless data locality
> constraints the placement).
>
> Jacek
> On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <sa...@gmail.com> wrote:
>
>> Hi,
>>
>> In yarn-cluster mode, is there any way to specify on which node I want
>> the driver to run?
>>
>> Thanks.
>>
>

Re: Specify node where driver should run

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

It's not possible. YARN uses CPU and memory for resource constraints and
places AM on any node available. Same about executors (unless data locality
constraints the placement).

Jacek
On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <sa...@gmail.com> wrote:

> Hi,
>
> In yarn-cluster mode, is there any way to specify on which node I want the
> driver to run?
>
> Thanks.
>