You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Victor Tso-Guillen <vt...@paxata.com> on 2014/09/12 02:44:49 UTC

Configuring Spark for heterogenous hardware

So I have a bunch of hardware with different core and memory setups. Is
there a way to do one of the following:

1. Express a ratio of cores to memory to retain. The spark worker config
would represent all of the cores and all of the memory usable for any
application, and the application would take a fraction that sustains the
ratio. Say I have 4 cores and 20G of RAM. I'd like it to have the worker
take 4/20 and the executor take 5 G for each of the 4 cores, thus maxing
both out. If there were only 16G with the same ratio requirement, it would
only take 3 cores and 12G in a single executor and leave the rest.

2. Have the executor take whole number ratios of what it needs. Say it is
configured for 2/8G and the worker has 4/20. So we can give the executor
2/8G (which is true now) or we can instead give it 4/16G, maxing out one of
the two parameters.

Either way would allow me to get my heterogenous hardware all participating
in the work of my spark cluster, presumably without endangering spark's
assumption of homogenous execution environments in the dimensions of memory
and cores. If there's any way to do this, please enlighten me.

Re: Configuring Spark for heterogenous hardware

Posted by Victor Tso-Guillen <vt...@paxata.com>.
Hmm, interesting. I'm using standalone mode but I could consider YARN. I'll
have to simmer on that one. Thanks as always, Sean!

On Wed, Sep 17, 2014 at 12:40 AM, Sean Owen <so...@cloudera.com> wrote:

> I thought I answered this ... you can easily accomplish this with YARN
> by just telling YARN how much memory / CPU each machine has. This can
> be configured in groups too rather than per machine. I don't think you
> actually want differently-sized executors, and so don't need ratios.
> But you can have differently-sized containers which can fit different
> numbers of executors as appropriate.
>
> On Wed, Sep 17, 2014 at 8:35 AM, Victor Tso-Guillen <vt...@paxata.com>
> wrote:
> > I'm supposing that there's no good solution to having heterogenous
> hardware
> > in a cluster. What are the prospects of having something like this in the
> > future? Am I missing an architectural detail that precludes this
> > possibility?
> >
> > Thanks,
> > Victor
> >
> > On Fri, Sep 12, 2014 at 12:10 PM, Victor Tso-Guillen <vt...@paxata.com>
> > wrote:
> >>
> >> Ping...
> >>
> >> On Thu, Sep 11, 2014 at 5:44 PM, Victor Tso-Guillen <vt...@paxata.com>
> >> wrote:
> >>>
> >>> So I have a bunch of hardware with different core and memory setups. Is
> >>> there a way to do one of the following:
> >>>
> >>> 1. Express a ratio of cores to memory to retain. The spark worker
> config
> >>> would represent all of the cores and all of the memory usable for any
> >>> application, and the application would take a fraction that sustains
> the
> >>> ratio. Say I have 4 cores and 20G of RAM. I'd like it to have the
> worker
> >>> take 4/20 and the executor take 5 G for each of the 4 cores, thus
> maxing
> >>> both out. If there were only 16G with the same ratio requirement, it
> would
> >>> only take 3 cores and 12G in a single executor and leave the rest.
> >>>
> >>> 2. Have the executor take whole number ratios of what it needs. Say it
> is
> >>> configured for 2/8G and the worker has 4/20. So we can give the
> executor
> >>> 2/8G (which is true now) or we can instead give it 4/16G, maxing out
> one of
> >>> the two parameters.
> >>>
> >>> Either way would allow me to get my heterogenous hardware all
> >>> participating in the work of my spark cluster, presumably without
> >>> endangering spark's assumption of homogenous execution environments in
> the
> >>> dimensions of memory and cores. If there's any way to do this, please
> >>> enlighten me.
> >>
> >>
> >
>

Re: Configuring Spark for heterogenous hardware

Posted by Sean Owen <so...@cloudera.com>.
I thought I answered this ... you can easily accomplish this with YARN
by just telling YARN how much memory / CPU each machine has. This can
be configured in groups too rather than per machine. I don't think you
actually want differently-sized executors, and so don't need ratios.
But you can have differently-sized containers which can fit different
numbers of executors as appropriate.

On Wed, Sep 17, 2014 at 8:35 AM, Victor Tso-Guillen <vt...@paxata.com> wrote:
> I'm supposing that there's no good solution to having heterogenous hardware
> in a cluster. What are the prospects of having something like this in the
> future? Am I missing an architectural detail that precludes this
> possibility?
>
> Thanks,
> Victor
>
> On Fri, Sep 12, 2014 at 12:10 PM, Victor Tso-Guillen <vt...@paxata.com>
> wrote:
>>
>> Ping...
>>
>> On Thu, Sep 11, 2014 at 5:44 PM, Victor Tso-Guillen <vt...@paxata.com>
>> wrote:
>>>
>>> So I have a bunch of hardware with different core and memory setups. Is
>>> there a way to do one of the following:
>>>
>>> 1. Express a ratio of cores to memory to retain. The spark worker config
>>> would represent all of the cores and all of the memory usable for any
>>> application, and the application would take a fraction that sustains the
>>> ratio. Say I have 4 cores and 20G of RAM. I'd like it to have the worker
>>> take 4/20 and the executor take 5 G for each of the 4 cores, thus maxing
>>> both out. If there were only 16G with the same ratio requirement, it would
>>> only take 3 cores and 12G in a single executor and leave the rest.
>>>
>>> 2. Have the executor take whole number ratios of what it needs. Say it is
>>> configured for 2/8G and the worker has 4/20. So we can give the executor
>>> 2/8G (which is true now) or we can instead give it 4/16G, maxing out one of
>>> the two parameters.
>>>
>>> Either way would allow me to get my heterogenous hardware all
>>> participating in the work of my spark cluster, presumably without
>>> endangering spark's assumption of homogenous execution environments in the
>>> dimensions of memory and cores. If there's any way to do this, please
>>> enlighten me.
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Configuring Spark for heterogenous hardware

Posted by Victor Tso-Guillen <vt...@paxata.com>.
I'm supposing that there's no good solution to having heterogenous hardware
in a cluster. What are the prospects of having something like this in the
future? Am I missing an architectural detail that precludes this
possibility?

Thanks,
Victor

On Fri, Sep 12, 2014 at 12:10 PM, Victor Tso-Guillen <vt...@paxata.com>
wrote:

> Ping...
>
> On Thu, Sep 11, 2014 at 5:44 PM, Victor Tso-Guillen <vt...@paxata.com>
> wrote:
>
>> So I have a bunch of hardware with different core and memory setups. Is
>> there a way to do one of the following:
>>
>> 1. Express a ratio of cores to memory to retain. The spark worker config
>> would represent all of the cores and all of the memory usable for any
>> application, and the application would take a fraction that sustains the
>> ratio. Say I have 4 cores and 20G of RAM. I'd like it to have the worker
>> take 4/20 and the executor take 5 G for each of the 4 cores, thus maxing
>> both out. If there were only 16G with the same ratio requirement, it would
>> only take 3 cores and 12G in a single executor and leave the rest.
>>
>> 2. Have the executor take whole number ratios of what it needs. Say it is
>> configured for 2/8G and the worker has 4/20. So we can give the executor
>> 2/8G (which is true now) or we can instead give it 4/16G, maxing out one of
>> the two parameters.
>>
>> Either way would allow me to get my heterogenous hardware all
>> participating in the work of my spark cluster, presumably without
>> endangering spark's assumption of homogenous execution environments in the
>> dimensions of memory and cores. If there's any way to do this, please
>> enlighten me.
>>
>
>

Re: Configuring Spark for heterogenous hardware

Posted by Victor Tso-Guillen <vt...@paxata.com>.
Ping...

On Thu, Sep 11, 2014 at 5:44 PM, Victor Tso-Guillen <vt...@paxata.com> wrote:

> So I have a bunch of hardware with different core and memory setups. Is
> there a way to do one of the following:
>
> 1. Express a ratio of cores to memory to retain. The spark worker config
> would represent all of the cores and all of the memory usable for any
> application, and the application would take a fraction that sustains the
> ratio. Say I have 4 cores and 20G of RAM. I'd like it to have the worker
> take 4/20 and the executor take 5 G for each of the 4 cores, thus maxing
> both out. If there were only 16G with the same ratio requirement, it would
> only take 3 cores and 12G in a single executor and leave the rest.
>
> 2. Have the executor take whole number ratios of what it needs. Say it is
> configured for 2/8G and the worker has 4/20. So we can give the executor
> 2/8G (which is true now) or we can instead give it 4/16G, maxing out one of
> the two parameters.
>
> Either way would allow me to get my heterogenous hardware all
> participating in the work of my spark cluster, presumably without
> endangering spark's assumption of homogenous execution environments in the
> dimensions of memory and cores. If there's any way to do this, please
> enlighten me.
>