You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2016/09/06 23:56:21 UTC

[jira] [Resolved] (SPARK-17371) Resubmitted stage outputs deleted by zombie map tasks on stop()

     [ https://issues.apache.org/jira/browse/SPARK-17371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen resolved SPARK-17371.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 2.1.0

Issue resolved by pull request 14932
[https://github.com/apache/spark/pull/14932]

> Resubmitted stage outputs deleted by zombie map tasks on stop()
> ---------------------------------------------------------------
>
>                 Key: SPARK-17371
>                 URL: https://issues.apache.org/jira/browse/SPARK-17371
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Eric Liang
>             Fix For: 2.1.0
>
>
> It seems that old shuffle map tasks hanging around after a stage resubmit will delete intended shuffle output files on stop(), causing downstream stages to fail even after successful resubmit completion. This can happen easily if the prior map task is waiting for a network timeout when its stage is resubmitted.
> This can cause unnecessary stage resubmits, sometimes multiple times, and very confusing FetchFailure messages that report shuffle index files missing from the local disk.
> Given that IndexShuffleBlockResolver commits data atomically, it seems unnecessary to ever delete committed task output: even in the rare case that a task is failed after it finishes committing shuffle output, it should be safe to retain that output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org