You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Attila Zsolt Piros (JIRA)" <ji...@apache.org> on 2018/12/10 16:40:00 UTC

[jira] [Comment Edited] (SPARK-24920) Spark should allow sharing netty's memory pools across all uses

    [ https://issues.apache.org/jira/browse/SPARK-24920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715040#comment-16715040 ] 

Attila Zsolt Piros edited comment on SPARK-24920 at 12/10/18 4:39 PM:
----------------------------------------------------------------------

I started to work on it.

I would keep a separate memory pool for transport clients and transport servers. 
 This way cache allowance for each pool would be like before this change: for transport servers it would be on and for transport clients it would be off.

 


was (Author: attilapiros):
I started to work on it.

I would keep a separate memory pool for transport clients and transport servers. 
This way cache allowance for each pool would be like before the change for transport servers it would be on and for transport clients it would be off. 

 

> Spark should allow sharing netty's memory pools across all uses
> ---------------------------------------------------------------
>
>                 Key: SPARK-24920
>                 URL: https://issues.apache.org/jira/browse/SPARK-24920
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Imran Rashid
>            Priority: Major
>              Labels: memory-analysis
>
> Spark currently creates separate netty memory pools for each of the following "services":
> 1) RPC Client
> 2) RPC Server
> 3) BlockTransfer Client
> 4) BlockTransfer Server
> 5) ExternalShuffle Client
> Depending on configuration and whether its an executor or driver JVM, different of these are active, but its always either 3 or 4.
> Having them independent somewhat defeats the purpose of using pools at all.  In my experiments I've found each pool will grow due to a burst of activity in the related service (eg. task start / end msgs), followed another burst in a different service (eg. sending torrent broadcast blocks).  Because of the way these pools work, they allocate memory in large chunks (16 MB by default) for each netty thread, so there is often a surge of 128 MB of allocated memory, even for really tiny messages.  Also a lot of this memory is offheap by default, which makes it even tougher for users to manage.
> I think it would make more sense to combine all of these into a single pool.  In some experiments I tried, this noticeably decreased memory usage, both onheap and offheap (no significant performance effect in my small experiments).
> As this is a pretty core change, as I first step I'd propose just exposing this as a conf, to let user experiment more broadly across a wider range of workloads



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org