You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2018/08/06 17:23:00 UTC

[jira] [Created] (SPARK-25035) Replication disk-stored blocks should avoid memory mapping

Imran Rashid created SPARK-25035:
------------------------------------

             Summary: Replication disk-stored blocks should avoid memory mapping
                 Key: SPARK-25035
                 URL: https://issues.apache.org/jira/browse/SPARK-25035
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.3.1
            Reporter: Imran Rashid


This is a follow-up to SPARK-24296.

When replicating a disk-cached block, even if we fetch-to-disk, we still memory-map the file, just to copy it to another location.

Ideally we'd just move the tmp file to the right location.  But even without that, we could read the file as an input stream, instead of memory-mapping the whole thing.  Memory-mapping is particularly a problem when running under yarn, as the OS may believe there is plenty of memory available, meanwhile yarn decides to kill the process for exceeding memory limits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org