You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/07/31 16:24:00 UTC
[jira] [Commented] (SPARK-24917) Sending a partition over netty results in 2x memory usage

    [ https://issues.apache.org/jira/browse/SPARK-24917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16563918#comment-16563918 ] 

Apache Spark commented on SPARK-24917:
--------------------------------------

User 'vincent-grosbois' has created a pull request for this issue:
https://github.com/apache/spark/pull/21933

> Sending a partition over netty results in 2x memory usage
> ---------------------------------------------------------
>
>                 Key: SPARK-24917
>                 URL: https://issues.apache.org/jira/browse/SPARK-24917
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.2.2
>            Reporter: Vincent
>            Priority: Major
>
> Hello
> while investigating some OOM errors in Spark 2.2 [(here's my call stack)|https://image.ibb.co/hHa2R8/sparkOOM.png], I find the following behavior happening, which I think is weird:
>  * a request happens to send a partition over network
>  * this partition is 1.9 GB and is persisted in memory
>  * this partition is apparently stored in a ByteBufferBlockData, that is made of a ChunkedByteBuffer, which is a list of (lots of) ByteBuffer of 4 MB each.
>  * the call to toNetty() is supposed to only wrap all the arrays and not allocate any memory
>  * yet the call stack shows that netty is allocating memory and is trying to consolidate all the chunks into one big 1.9GB array
>  * this means that at this point the memory footprint is 2x the size of the actual partition (which is huge when the partition is 1.9GB)
> Is this transient allocation expected?
> After digging, it turns out that the actual copy is due to [this method|https://github.com/netty/netty/blob/4.0/buffer/src/main/java/io/netty/buffer/Unpooled.java#L260] in netty. If my initial buffer is made of more than DEFAULT_MAX_COMPONENTS (16) components it will trigger a re-allocation of all the buffer. This netty issue was fixed in this recent change : [https://github.com/netty/netty/commit/9b95b8ee628983e3e4434da93fffb893edff4aa2]
>  
> As a result, is it possible to benefit from this change somehow in spark 2.2 and above? I don't know how the netty dependencies are handled for spark
>  
> NB: it seems this ticket: [https://jira.apache.org/jira/browse/SPARK-24307] kinda changed the approach for spark 2.4 by bypassing netty buffer altogether. However as it is written in the ticket, this approach *still* needs to have the *entire* block serialized in memory, so this would be a downgrade from fixing the netty issue when your buffer in <  2GB
>  
> Thanks!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org