You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Andrew Or (JIRA)" <ji...@apache.org> on 2016/06/03 00:48:59 UTC

[jira] [Resolved] (SPARK-15736) Gracefully handle loss of DiskStore files

     [ https://issues.apache.org/jira/browse/SPARK-15736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Or resolved SPARK-15736.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0
                   1.6.2

> Gracefully handle loss of DiskStore files
> -----------------------------------------
>
>                 Key: SPARK-15736
>                 URL: https://issues.apache.org/jira/browse/SPARK-15736
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 1.6.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>             Fix For: 1.6.2, 2.0.0
>
>
> If an RDD partition is cached on disk and the DiskStore file is lost, then reads of that cached partition will fail and the missing partition is supposed to be recomputed by a new task attempt. In the current BlockManager implementation, however, the missing file does not trigger any metadata updates / does not invalidate the cache, so subsequent task attempts will be scheduled on the same executor and the doomed read will be repeatedly retried, leading to repeated task failures and eventually a total job failure.
> In order to fix this problem, the executor with the missing file needs to properly mark the corresponding block as missing so that it stops advertising itself as a cache location for that block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org