You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/08/07 13:03:00 UTC

[jira] [Assigned] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

     [ https://issues.apache.org/jira/browse/SPARK-25034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-25034:
------------------------------------

    Assignee:     (was: Apache Spark)

> possible triple memory consumption in fetchBlockSync()
> ------------------------------------------------------
>
>                 Key: SPARK-25034
>                 URL: https://issues.apache.org/jira/browse/SPARK-25034
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.2.2, 2.3.0, 2.4.0
>            Reporter: Vincent
>            Priority: Major
>
> Hello
> in the code of  _fetchBlockSync_() in _blockTransferService_, we have:
>  
> {code:java}
> val ret = ByteBuffer.allocate(data.size.toInt)
> ret.put(data.nioByteBuffer())
> ret.flip()
> result.success(new NioManagedBuffer(ret)) 
> {code}
> In some cases, the _data_ variable is a _NettyManagedBuffer_, whose underlying netty representation is a _CompositeByteBuffer_.
> Going through the code above in this configuration, assuming that the variable _data_ holds N bytes:
> 1) we allocate a full buffer of N bytes in _ret_
> 2) calling _data.nioByteBuffer()_ on a  _CompositeByteBuffer_ will trigger a full merge of all the composite buffers, which will allocate  *again* a full buffer of N bytes
> 3) we copy to _ret_ the data byte by byte
> This means that at some point the N bytes of data are located 3 times in memory.
> Is this really necessary?
> It seems unclear to me why we have to process at all the data, given that we receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ 
> Is there something I'm missing here? It seems this whole operation could be done with 0 copies. 
> The only upside here is that the new buffer will have merged all the composite buffer's arrays, but it is really not clear if this is intended. In any case this could be done with peak memory of 2N and not 3N
> Cheers!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org