You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2014/11/06 08:07:34 UTC

[jira] [Resolved] (SPARK-824) Make less copies of blocks during remote reads

     [ https://issues.apache.org/jira/browse/SPARK-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia resolved SPARK-824.
---------------------------------
    Resolution: Fixed

This is a pretty old issue that no longer affects the newest block manager and Netty code.

> Make less copies of blocks during remote reads
> ----------------------------------------------
>
>                 Key: SPARK-824
>                 URL: https://issues.apache.org/jira/browse/SPARK-824
>             Project: Spark
>          Issue Type: Improvement
>          Components: Block Manager
>            Reporter: Charles Reiss
>
> To satisfy a getRemote() when the source block is stored as in deserialized form, Spark will make at least two copies: it will create a bytebuffer of the entire deserialized block on the source node and one on the destination node. If the block consists of a single object (as occurs frequently in Shark and apparently PySpark), then the total effect is that remote reads require extra memory equal to three times the block size. As a result, especially with relatively coarse partitioning, Spark can require a surprisingly large amount of non-cache headroom.
> One copy could be avoided by serializing blocks as they are sent over the network. This would require connection manager and block manager API changes and might have negative effects if serialization is expensive relative to the connection speed.
> Similarly, a copy might be avoided by deserializing blocks as they are received over the network, but this probably would hurt some applications (e.g. if they process items from a block much slower than the network speed).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org