You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/09/02 01:54:21 UTC

[jira] [Assigned] (SPARK-17371) Resubmitted stage outputs deleted by zombie map tasks on stop()

     [ https://issues.apache.org/jira/browse/SPARK-17371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-17371:
------------------------------------

    Assignee:     (was: Apache Spark)

> Resubmitted stage outputs deleted by zombie map tasks on stop()
> ---------------------------------------------------------------
>
>                 Key: SPARK-17371
>                 URL: https://issues.apache.org/jira/browse/SPARK-17371
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Eric Liang
>
> It seems that old shuffle map tasks hanging around after a stage resubmit will delete intended shuffle output files on stop(), causing downstream stages to fail even after successful resubmit completion. This can happen easily if the prior map task is waiting for a network timeout when its stage is resubmitted.
> This can cause unnecessary stage resubmits, sometimes multiple times, and very confusing FetchFailure messages that report shuffle index files missing from the local disk.
> Given that IndexShuffleBlockResolver commits data atomically, it seems unnecessary to ever delete committed task output: even in the rare case that a task is failed after it finishes committing shuffle output, it should be safe to retain that output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org