You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/03/17 20:04:33 UTC

[jira] [Commented] (SPARK-13980) Incrementally serialize blocks while unrolling them in MemoryStore

    [ https://issues.apache.org/jira/browse/SPARK-13980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200178#comment-15200178 ] 

Apache Spark commented on SPARK-13980:
--------------------------------------

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/11791

> Incrementally serialize blocks while unrolling them in MemoryStore
> ------------------------------------------------------------------
>
>                 Key: SPARK-13980
>                 URL: https://issues.apache.org/jira/browse/SPARK-13980
>             Project: Spark
>          Issue Type: Improvement
>          Components: Block Manager
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> When a block is persisted in the MemoryStore at a serialized storage level, the current MemoryStore.putIterator() code will unroll the entire iterator as Java objects in memory, then will turn around and serialize an iterator obtained from the unrolled array. This is inefficient and doubles our peak memory requirements. Instead, I think that we should incrementally serialize blocks while unrolling them. A downside to incremental serialization is the fact that we will need to deserialize the partially-unrolled data in case there is not enough space to unroll the block and the block cannot be dropped to disk. However, I'm hoping that the memory efficiency improvements will outweigh any performance losses as a result of extra serialization in that hopefully-rare case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org