You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by vtygoss <vt...@126.com> on 2022/08/25 12:02:16 UTC

memory module of yarn container

Hi, community!


I notice a change about the memory module of yarn container between spark-2.3.0 and spark-3.2.1 when requesting containers from yarn.


org.apache.spark.deploy.yarn.Client.java # verifyClusterResources 


```
spark-2.3.0
val executorMem = executorMemory + executorMemoryOverhead
```


```
spark-3.2.1 
val executorMem =
 executorMemory + executorOffHeapMemory + executorMemoryOverhead + pysparkWorkerMemory
```


And i have these questions:


1. in spark-2.3.0 and spark-3.2.1, what is memoryOverhead and where is it used? 
2. what is the difference between memoryOverhead and off-heap memory, native memory, direct memory? There is no such concept in apache flink, is it an unique concept of spark? 
3. in spark-2.3.0, i think that memoryOverhead contains all non-heap memory, including off-heap / native / direct. Do i think wrong? 


Thanks for your any replies.


Best Regards!

Re: memory module of yarn container

Posted by "Yang,Jie(INF)" <ya...@baidu.com>.
Hi, vtygoss

In my memory, the memoryOverhead in Spark 2.3 includes all the memories that are not executor onHeap memory, including the memory used by Spark offheapMemoryPool(executorOffHeapMemory, this concept also exists in Spark 2.3),  PySparkWorker,  PipeRDD used,  netty memory pool,  JVM direct memory and so on.

In Spark 2.3, the size relationship between memoryOverhead and executorOffHeapMemory is not strongly dependent. For example, if the user configures executorMemory=1g , executoroffheapmemory=2g, and executormemoryoverhead=1g , this does not raise an error at the resource request stage and the request memory resource is 3g, but at least 4g is required.

In Spark 3.x, executorMemoryOverhead no longer includes the memory used by Spark offheapMemoryPool(executorOffHeapMemory), I think this can ensure that Spark offheapMemoryPool has enough memory.

Warm regards,
YangJie


发件人: vtygoss <vt...@126.com>
日期: 2022年8月25日 星期四 20:02
收件人: spark <de...@spark.apache.org>
主题: memory module of yarn container


Hi, community!



I notice a change about the memory module of yarn container between spark-2.3.0 and spark-3.2.1 when requesting containers from yarn.



org.apache.spark.deploy.yarn.Client.java # verifyClusterResources



```

spark-2.3.0

val executorMem = executorMemory + executorMemoryOverhead

```



```

spark-3.2.1
val executorMem =
executorMemory + executorOffHeapMemory + executorMemoryOverhead + pysparkWorkerMemory
```

And i have these questions:

1. in spark-2.3.0 and spark-3.2.1, what is memoryOverhead and where is it used?
2. what is the difference between memoryOverhead and off-heap memory, native memory, direct memory? There is no such concept in apache flink, is it an unique concept of spark?
3. in spark-2.3.0, i think that memoryOverhead contains all non-heap memory, including off-heap / native / direct. Do i think wrong?

Thanks for your any replies.

Best Regards!