You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mridul Muralidharan (JIRA)" <ji...@apache.org> on 2014/09/03 07:37:52 UTC

[jira] [Commented] (SPARK-1476) 2GB limit in spark for blocks

    [ https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119385#comment-14119385 ] 

Mridul Muralidharan commented on SPARK-1476:
--------------------------------------------

WIP version pushed to https://github.com/mridulm/spark/tree/2g_fix - about 2 weeks before feature freeze in 1.1 iirc. 

Note that the 2g fixes are functionally complete, but this branch also includes a large number of other fixes.
Some of these have been pushed to master; while others have not yet done : for alleviating memory pressure primarily, and fixing resource leaks.

This branch has been shared for reference purpose - and is not meant to be actively worked on for merging into master.
We will need to cherry pick the changes and do that manually.

> 2GB limit in spark for blocks
> -----------------------------
>
>                 Key: SPARK-1476
>                 URL: https://issues.apache.org/jira/browse/SPARK-1476
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>         Environment: all
>            Reporter: Mridul Muralidharan
>            Assignee: Mridul Muralidharan
>            Priority: Critical
>         Attachments: 2g_fix_proposal.pdf
>
>
> The underlying abstraction for blocks in spark is a ByteBuffer : which limits the size of the block to 2GB.
> This has implication not just for managed blocks in use, but also for shuffle blocks (memory mapped blocks are limited to 2gig, even though the api allows for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
> This is a severe limitation for use of spark when used on non trivial datasets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org