You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2014/05/21 11:55:38 UTC

[jira] [Commented] (SPARK-1888) enhance MEMORY_AND_DISK mode by dropping blocks in parallel

    [ https://issues.apache.org/jira/browse/SPARK-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004538#comment-14004538 ] 

Wenchen Fan commented on SPARK-1888:
------------------------------------

I introduce a new member for `Entry` (var dropping: Boolean) and use it as a flag. When `ensureFreeSpace` is selecting blocks to be dropped, it will skip blocks that marked as dropping. And if `ensureFreeSpace` successfully select some to-be-dropped blocks, it will just mark their entries as dropping and return them to the caller, let caller do the dropping. If the caller hit exception during dropping, it will reset the to-be-dropped blocks' dropping flag. All operations(read, write) to the dropping flag is synchronized by `entries` so modification to the flag can be seen by other threads immediately.
Can one of the admins verify my diff? [~tdas] [~rxin]

> enhance MEMORY_AND_DISK mode by dropping blocks in parallel
> -----------------------------------------------------------
>
>                 Key: SPARK-1888
>                 URL: https://issues.apache.org/jira/browse/SPARK-1888
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Wenchen Fan
>
> Sometimes MEMORY_AND_DISK mode is slower than DISK_ONLY mode because of the lock on IO operations(dropping blocks in memory store). As the TODO says, the solution is: only synchronize the selecting of to-be-dropped blocks and do the dropping in parallel. I have a quick fix in my PR: https://github.com/apache/spark/pull/791#issuecomment-43567924
> It's fragile currently  but I'm working on it to make it more robust.



--
This message was sent by Atlassian JIRA
(v6.2#6252)