You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vincent (JIRA)" <ji...@apache.org> on 2018/08/06 16:50:00 UTC
[jira] [Created] (SPARK-25034) possible triple memory consumption
in fetchBlockSync()
Vincent created SPARK-25034:
-------------------------------
Summary: possible triple memory consumption in fetchBlockSync()
Key: SPARK-25034
URL: https://issues.apache.org/jira/browse/SPARK-25034
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.3.0, 2.2.2, 2.4.0
Reporter: Vincent
Hello
in the code of _fetchBlockSync_() in _blockTransferService_, we have:
{code:java}
val ret = ByteBuffer.allocate(data.size.toInt)
ret.put(data.nioByteBuffer())
ret.flip()
result.success(new NioManagedBuffer(ret))
{code}
In some cases, the _data_ variable is a _NettyManagedBuffer_, whose underlying netty representation is a _CompositeByteBuffer_.
Going through the code above in this configuration, assuming that the variable _data_ holds N bytes:
1) we allocate a full buffer of N bytes in _ret_
2) calling _data.nioByteBuffer()_ on a _CompositeByteBuffer_ will trigger a full merge of all the composite buffers, which will allocate *again* a full buffer of N bytes
3) we copy to _ret_ the data byte by byte
This means that at some point the N bytes of data are located 3 times in memory.
Is this really necessary?
It seems unclear to me why we have to process at all the data, given that we receive a _ManagedBuffer_ and we want to return a _ManagedBuffer_
Is there something I'm missing here? It seems this whole operation could be done with 0 copies.
The only upside here is that the new buffer will have merged all the composite buffer's arrays, but it is really not clear if this is intended. In any case this could be done with peak memory of 2N and not 3N
Cheers!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org