You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2016/03/22 02:17:25 UTC

[jira] [Updated] (SPARK-3000) Drop old blocks to disk in parallel when memory is not large enough for caching new blocks

     [ https://issues.apache.org/jira/browse/SPARK-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen updated SPARK-3000:
------------------------------
    Target Version/s: 2.0.0
             Summary: Drop old blocks to disk in parallel when memory is not large enough for caching new blocks  (was: drop old blocks to disk in parallel when memory is not large enough for caching new blocks)

> Drop old blocks to disk in parallel when memory is not large enough for caching new blocks
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3000
>                 URL: https://issues.apache.org/jira/browse/SPARK-3000
>             Project: Spark
>          Issue Type: Improvement
>          Components: Block Manager, Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Zhang, Liye
>            Assignee: Josh Rosen
>         Attachments: Spark-3000 Design Doc.pdf
>
>
> In spark, rdd can be cached in memory for later use, and the cached memory size is "*spark.executor.memory * spark.storage.memoryFraction*" for spark version before 1.1.0, and "*spark.executor.memory * spark.storage.memoryFraction * spark.storage.safetyFraction*" after [SPARK-1777|https://issues.apache.org/jira/browse/SPARK-1777]. 
> For Storage level *MEMORY_AND_DISK*, when free memory is not enough to cache new blocks, old blocks might be dropped to disk to free up memory for new blocks. This operation is processed by _ensureFreeSpace_ in _MemoryStore.scala_, there will always be a "*accountingLock*" held by the caller to ensure only one thread is dropping blocks. This method can not fully used the disks throughput when there are multiple disks on the working node. When testing our workload, we found this is really a bottleneck when size of old blocks to be dropped is really large. 
> We have tested the parallel method on spark 1.0, the speedup is significant. So it's necessary to make dropping blocks operation in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org