You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Alvaro Brandon <al...@gmail.com> on 2017/02/07 10:20:52 UTC

Launching an Spark application in a subset of machines

Hello all:

I have the following scenario.
- I have a cluster of 50 machines with Hadoop and Spark installed on them.
- I want to launch one Spark application through spark submit. However I
want this application to run on only a subset of these machines,
disregarding data locality. (e.g. 10 machines)

Is this possible?. Is there any option in the standalone scheduler, YARN or
Mesos that allows such thing?.

Re: Launching an Spark application in a subset of machines

Posted by Alvaro Brandon <al...@gmail.com>.

I want to scale up or down the number of machines used, depending on the
SLA of a job. For example if I have a low priority job I will give it 10
machines, while a high priority will be given 50. Also I want to choose
subsets depending on the hardware. For example "Launch this job only on
machines with GPU's".

Looking into Mesos attributes this seems the perfect fit for it. Is that
correct?

2017-02-07 12:27 GMT+01:00 Jörn Franke <jo...@gmail.com>:

> If you want to run them always on the same machines use yarn node labels.
> If it is any 10 machines then use capacity or fair scheduler.
>
> What is the use case for running it always on the same 10 machines. If it
> is for licensing reasons then I would ask your vendor if this is a suitable
> mean to ensure license compliance. Otherwise dedicated cluster.
>
> On 7 Feb 2017, at 12:09, Alvaro Brandon <al...@gmail.com> wrote:
>
> Hello Pavel:
>
> Thanks for the pointers.
>
> For standalone cluster manager: I understand that I just have to start
> several masters with a subset of slaves attached. Then each master will
> listen on a different pair of <hostname,port>, allowing me to spark-submit
> to any of these pairs depending on the subset of machines I want to use.
>
> For Mesos: I haven't used Mesos much. Any references or documentation I
> can use to set this up?
>
> Best Regards
>
>
>
> 2017-02-07 11:36 GMT+01:00 Pavel Plotnikov <pavel.plotnikov@team.wrike.com
> >:
>
>> Hi, Alvaro
>> You can create different clusters using standalone cluster manager, and
>> than manage subset of machines through submitting application on different
>> masters. Or you can use Mesos attributes to mark subset of workers and
>> specify it in spark.mesos.constraints
>>
>>
>> On Tue, Feb 7, 2017 at 1:21 PM Alvaro Brandon <al...@gmail.com>
>> wrote:
>>
>>> Hello all:
>>>
>>> I have the following scenario.
>>> - I have a cluster of 50 machines with Hadoop and Spark installed on
>>> them.
>>> - I want to launch one Spark application through spark submit. However I
>>> want this application to run on only a subset of these machines,
>>> disregarding data locality. (e.g. 10 machines)
>>>
>>> Is this possible?. Is there any option in the standalone scheduler, YARN
>>> or Mesos that allows such thing?.
>>>
>>>
>>>
>

Re: Launching an Spark application in a subset of machines

Posted by Jörn Franke <jo...@gmail.com>.

If you want to run them always on the same machines use yarn node labels. If it is any 10 machines then use capacity or fair scheduler.

What is the use case for running it always on the same 10 machines. If it is for licensing reasons then I would ask your vendor if this is a suitable mean to ensure license compliance. Otherwise dedicated cluster.

> On 7 Feb 2017, at 12:09, Alvaro Brandon <al...@gmail.com> wrote:
> 
> Hello Pavel:
> 
> Thanks for the pointers. 
> 
> For standalone cluster manager: I understand that I just have to start several masters with a subset of slaves attached. Then each master will listen on a different pair of <hostname,port>, allowing me to spark-submit to any of these pairs depending on the subset of machines I want to use.
> 
> For Mesos: I haven't used Mesos much. Any references or documentation I can use to set this up?
> 
> Best Regards
> 
> 
> 
> 2017-02-07 11:36 GMT+01:00 Pavel Plotnikov <pa...@team.wrike.com>:
>> Hi, Alvaro
>> You can create different clusters using standalone cluster manager, and than manage subset of machines through submitting application on different masters. Or you can use Mesos attributes to mark subset of workers and specify it in spark.mesos.constraints
>> 
>> 
>>> On Tue, Feb 7, 2017 at 1:21 PM Alvaro Brandon <al...@gmail.com> wrote:
>>> Hello all:
>>> 
>>> I have the following scenario. 
>>> - I have a cluster of 50 machines with Hadoop and Spark installed on them. 
>>> - I want to launch one Spark application through spark submit. However I want this application to run on only a subset of these machines, disregarding data locality. (e.g. 10 machines)
>>> 
>>> Is this possible?. Is there any option in the standalone scheduler, YARN or Mesos that allows such thing?.
>>> 
>>> 
>

Re: Launching an Spark application in a subset of machines

Posted by Alvaro Brandon <al...@gmail.com>.

Hello Pavel:

Thanks for the pointers.

For standalone cluster manager: I understand that I just have to start
several masters with a subset of slaves attached. Then each master will
listen on a different pair of <hostname,port>, allowing me to spark-submit
to any of these pairs depending on the subset of machines I want to use.

For Mesos: I haven't used Mesos much. Any references or documentation I can
use to set this up?

Best Regards

2017-02-07 11:36 GMT+01:00 Pavel Plotnikov <pa...@team.wrike.com>:

> Hi, Alvaro
> You can create different clusters using standalone cluster manager, and
> than manage subset of machines through submitting application on different
> masters. Or you can use Mesos attributes to mark subset of workers and
> specify it in spark.mesos.constraints
>
>
> On Tue, Feb 7, 2017 at 1:21 PM Alvaro Brandon <al...@gmail.com>
> wrote:
>
>> Hello all:
>>
>> I have the following scenario.
>> - I have a cluster of 50 machines with Hadoop and Spark installed on
>> them.
>> - I want to launch one Spark application through spark submit. However I
>> want this application to run on only a subset of these machines,
>> disregarding data locality. (e.g. 10 machines)
>>
>> Is this possible?. Is there any option in the standalone scheduler, YARN
>> or Mesos that allows such thing?.
>>
>>
>>

Re: Launching an Spark application in a subset of machines

Posted by Pavel Plotnikov <pa...@team.wrike.com>.

Hi, Alvaro
You can create different clusters using standalone cluster manager, and
than manage subset of machines through submitting application on different
masters. Or you can use Mesos attributes to mark subset of workers and
specify it in spark.mesos.constraints

On Tue, Feb 7, 2017 at 1:21 PM Alvaro Brandon <al...@gmail.com>
wrote:

> Hello all:
>
> I have the following scenario.
> - I have a cluster of 50 machines with Hadoop and Spark installed on them.
> - I want to launch one Spark application through spark submit. However I
> want this application to run on only a subset of these machines,
> disregarding data locality. (e.g. 10 machines)
>
> Is this possible?. Is there any option in the standalone scheduler, YARN
> or Mesos that allows such thing?.
>
>
>

Re: Launching an Spark application in a subset of machines

Posted by Michael Gummelt <mg...@mesosphere.io>.

> Looking into Mesos attributes this seems the perfect fit for it. Is that
correct?

Yes.

On Tue, Feb 7, 2017 at 3:43 AM, Muhammad Asif Abbasi <as...@gmail.com>
wrote:

> YARN provides the concept of node labels. You should explore the
> "spark.yarn.executor.nodeLabelConfiguration" property.
>
>
> Cheers,
> Asif Abbasi
>
> On Tue, 7 Feb 2017 at 10:21, Alvaro Brandon <al...@gmail.com>
> wrote:
>
>> Hello all:
>>
>> I have the following scenario.
>> - I have a cluster of 50 machines with Hadoop and Spark installed on
>> them.
>> - I want to launch one Spark application through spark submit. However I
>> want this application to run on only a subset of these machines,
>> disregarding data locality. (e.g. 10 machines)
>>
>> Is this possible?. Is there any option in the standalone scheduler, YARN
>> or Mesos that allows such thing?.
>>
>>
>>


-- 
Michael Gummelt
Software Engineer
Mesosphere

Re: Launching an Spark application in a subset of machines

Posted by Muhammad Asif Abbasi <as...@gmail.com>.

YARN provides the concept of node labels. You should explore the
"spark.yarn.executor.nodeLabelConfiguration" property.


Cheers,
Asif Abbasi

On Tue, 7 Feb 2017 at 10:21, Alvaro Brandon <al...@gmail.com> wrote:

> Hello all:
>
> I have the following scenario.
> - I have a cluster of 50 machines with Hadoop and Spark installed on them.
> - I want to launch one Spark application through spark submit. However I
> want this application to run on only a subset of these machines,
> disregarding data locality. (e.g. 10 machines)
>
> Is this possible?. Is there any option in the standalone scheduler, YARN
> or Mesos that allows such thing?.
>
>
>