You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/09/02 01:24:21 UTC

[jira] [Commented] (SPARK-17370) Shuffle service files not invalidated when a slave is lost

    [ https://issues.apache.org/jira/browse/SPARK-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15457168#comment-15457168 ] 

Apache Spark commented on SPARK-17370:
--------------------------------------

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/14931

> Shuffle service files not invalidated when a slave is lost
> ----------------------------------------------------------
>
>                 Key: SPARK-17370
>                 URL: https://issues.apache.org/jira/browse/SPARK-17370
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Eric Liang
>
> DAGScheduler invalidates shuffle files when an executor loss event occurs, but not when the external shuffle service is enabled. This is because when shuffle service is on, the shuffle file lifetime can exceed the executor lifetime.
> However, it doesn't invalidate shuffle files when the shuffle service itself is lost (due to whole slave loss). This can cause long hangs when slaves are lost since the file loss is not detected until a subsequent stage attempts to read the shuffle files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org