You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Vincent (JIRA)" <ji...@apache.org> on 2018/08/06 16:50:00 UTC

[jira] [Created] (SPARK-25034) possible triple memory consumption in fetchBlockSync()

Vincent created SPARK-25034:
-------------------------------

             Summary: possible triple memory consumption in fetchBlockSync()
                 Key: SPARK-25034
                 URL: https://issues.apache.org/jira/browse/SPARK-25034
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.3.0, 2.2.2, 2.4.0
            Reporter: Vincent


Hello

in the code of  _fetchBlockSync_() in _blockTransferService_, we have:
 
{code:java}
val ret = ByteBuffer.allocate(data.size.toInt)
ret.put(data.nioByteBuffer())
ret.flip()
result.success(new NioManagedBuffer(ret)) 
{code}

In some cases, the _data_ variable is a _NettyManagedBuffer_, whose underlying netty representation is a _CompositeByteBuffer_.

Going through the code above in this configuration, assuming that the variable _data_ holds N bytes:
1) we allocate a full buffer of N bytes in _ret_
2) calling _data.nioByteBuffer()_ on a  _CompositeByteBuffer_ will trigger a full merge of all the composite buffers, which will allocate  *again* a full buffer of N bytes
3) we copy to _ret_ the data byte by byte

This means that at some point the N bytes of data are located 3 times in memory.
Is this really necessary?
It seems unclear to me why we have to process at all the data, given that we receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_ 
Is there something I'm missing here? It seems this whole operation could be done with 0 copies. 
The only upside here is that the new buffer will have merged all the composite buffer's arrays, but it is really not clear if this is intended. In any case this could be done with peak memory of 2N and not 3N

Cheers!
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org