You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Denny Lee <de...@gmail.com> on 2014/12/06 22:27:49 UTC

Spark on YARN memory utilization

This is perhaps more of a YARN question than a Spark question but i was
just curious to how is memory allocated in YARN via the various
configurations.  For example, if I spin up my cluster with 4GB with a
different number of executors as noted below

 4GB executor-memory x 10 executors = 46GB  (4GB x 10 = 40 + 6)
 4GB executor-memory x 4 executors = 19GB (4GB x 4 = 16 + 3)
 4GB executor-memory x 2 executors = 10GB (4GB x 2 = 8 + 2)

The pattern when observing the RM is that there is a container for each
executor and one additional container.  From the basis of memory, it looks
like there is an additional (1GB + (0.5GB x # executors)) that is allocated
in YARN.

Just wondering why is this  - or is this just an artifact of YARN itself?

Thanks!

Re: Spark on YARN memory utilization

Posted by Denny Lee <de...@gmail.com>.

Thanks Sandy!
On Mon, Dec 8, 2014 at 23:15 Sandy Ryza <sa...@cloudera.com> wrote:

> Another thing to be aware of is that YARN will round up containers to the
> nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults
> to 1024.
>
> -Sandy
>
> On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee <de...@gmail.com> wrote:
>
>> Got it - thanks!
>>
>> On Sat, Dec 6, 2014 at 14:56 Arun Ahuja <aa...@gmail.com> wrote:
>>
>>> Hi Denny,
>>>
>>> This is due the spark.yarn.memoryOverhead parameter, depending on what
>>> version of Spark you are on the default of this may differ, but it should
>>> be the larger of 1024mb per executor or .07 * executorMemory.
>>>
>>> When you set executor memory, the yarn resource request is
>>> executorMemory + yarnOverhead.
>>>
>>> - Arun
>>>
>>> On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee <de...@gmail.com> wrote:
>>>
>>>> This is perhaps more of a YARN question than a Spark question but i was
>>>> just curious to how is memory allocated in YARN via the various
>>>> configurations.  For example, if I spin up my cluster with 4GB with a
>>>> different number of executors as noted below
>>>>
>>>>  4GB executor-memory x 10 executors = 46GB  (4GB x 10 = 40 + 6)
>>>>  4GB executor-memory x 4 executors = 19GB (4GB x 4 = 16 + 3)
>>>>  4GB executor-memory x 2 executors = 10GB (4GB x 2 = 8 + 2)
>>>>
>>>> The pattern when observing the RM is that there is a container for each
>>>> executor and one additional container.  From the basis of memory, it looks
>>>> like there is an additional (1GB + (0.5GB x # executors)) that is allocated
>>>> in YARN.
>>>>
>>>> Just wondering why is this  - or is this just an artifact of YARN
>>>> itself?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>
>

Re: Spark on YARN memory utilization

Posted by Sandy Ryza <sa...@cloudera.com>.

Another thing to be aware of is that YARN will round up containers to the
nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults
to 1024.

-Sandy

On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee <de...@gmail.com> wrote:

> Got it - thanks!
>
> On Sat, Dec 6, 2014 at 14:56 Arun Ahuja <aa...@gmail.com> wrote:
>
>> Hi Denny,
>>
>> This is due the spark.yarn.memoryOverhead parameter, depending on what
>> version of Spark you are on the default of this may differ, but it should
>> be the larger of 1024mb per executor or .07 * executorMemory.
>>
>> When you set executor memory, the yarn resource request is executorMemory
>> + yarnOverhead.
>>
>> - Arun
>>
>> On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee <de...@gmail.com> wrote:
>>
>>> This is perhaps more of a YARN question than a Spark question but i was
>>> just curious to how is memory allocated in YARN via the various
>>> configurations.  For example, if I spin up my cluster with 4GB with a
>>> different number of executors as noted below
>>>
>>>  4GB executor-memory x 10 executors = 46GB  (4GB x 10 = 40 + 6)
>>>  4GB executor-memory x 4 executors = 19GB (4GB x 4 = 16 + 3)
>>>  4GB executor-memory x 2 executors = 10GB (4GB x 2 = 8 + 2)
>>>
>>> The pattern when observing the RM is that there is a container for each
>>> executor and one additional container.  From the basis of memory, it looks
>>> like there is an additional (1GB + (0.5GB x # executors)) that is allocated
>>> in YARN.
>>>
>>> Just wondering why is this  - or is this just an artifact of YARN itself?
>>>
>>> Thanks!
>>>
>>>
>>

Re: Spark on YARN memory utilization

Posted by Denny Lee <de...@gmail.com>.

Got it - thanks!
On Sat, Dec 6, 2014 at 14:56 Arun Ahuja <aa...@gmail.com> wrote:

> Hi Denny,
>
> This is due the spark.yarn.memoryOverhead parameter, depending on what
> version of Spark you are on the default of this may differ, but it should
> be the larger of 1024mb per executor or .07 * executorMemory.
>
> When you set executor memory, the yarn resource request is executorMemory
> + yarnOverhead.
>
> - Arun
>
> On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee <de...@gmail.com> wrote:
>
>> This is perhaps more of a YARN question than a Spark question but i was
>> just curious to how is memory allocated in YARN via the various
>> configurations.  For example, if I spin up my cluster with 4GB with a
>> different number of executors as noted below
>>
>>  4GB executor-memory x 10 executors = 46GB  (4GB x 10 = 40 + 6)
>>  4GB executor-memory x 4 executors = 19GB (4GB x 4 = 16 + 3)
>>  4GB executor-memory x 2 executors = 10GB (4GB x 2 = 8 + 2)
>>
>> The pattern when observing the RM is that there is a container for each
>> executor and one additional container.  From the basis of memory, it looks
>> like there is an additional (1GB + (0.5GB x # executors)) that is allocated
>> in YARN.
>>
>> Just wondering why is this  - or is this just an artifact of YARN itself?
>>
>> Thanks!
>>
>>
>

Re: Spark on YARN memory utilization

Posted by Arun Ahuja <aa...@gmail.com>.

Hi Denny,

This is due the spark.yarn.memoryOverhead parameter, depending on what
version of Spark you are on the default of this may differ, but it should
be the larger of 1024mb per executor or .07 * executorMemory.

When you set executor memory, the yarn resource request is executorMemory +
yarnOverhead.

- Arun

On Sat, Dec 6, 2014 at 4:27 PM, Denny Lee <de...@gmail.com> wrote:

> This is perhaps more of a YARN question than a Spark question but i was
> just curious to how is memory allocated in YARN via the various
> configurations.  For example, if I spin up my cluster with 4GB with a
> different number of executors as noted below
>
>  4GB executor-memory x 10 executors = 46GB  (4GB x 10 = 40 + 6)
>  4GB executor-memory x 4 executors = 19GB (4GB x 4 = 16 + 3)
>  4GB executor-memory x 2 executors = 10GB (4GB x 2 = 8 + 2)
>
> The pattern when observing the RM is that there is a container for each
> executor and one additional container.  From the basis of memory, it looks
> like there is an additional (1GB + (0.5GB x # executors)) that is allocated
> in YARN.
>
> Just wondering why is this  - or is this just an artifact of YARN itself?
>
> Thanks!
>
>