You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Aaron Davidson (JIRA)" <ji...@apache.org> on 2014/11/06 18:39:34 UTC

[jira] [Updated] (SPARK-4188) Shuffle fetches should be retried at a lower level

     [ https://issues.apache.org/jira/browse/SPARK-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Davidson updated SPARK-4188:
----------------------------------
    Description: 
During periods of high network (or GC) load, it is not uncommon that IOExceptions crop up around connection failures when fetching shuffle files. Unfortunately, when such a failure occurs, it is interpreted as an inability to fetch the files, which causes us to mark the executor as lost and recompute all of its shuffle outputs.
We should allow retrying at the network level in the event of an IOException in order to avoid this circumstance.


  was:Sometimes fetches will fail due to garbage collection pauses or network load. A simple retry could save recomputation of a lot of shuffle data, especially if it's below the task level (i.e., on the level of a single fetch).


> Shuffle fetches should be retried at a lower level
> --------------------------------------------------
>
>                 Key: SPARK-4188
>                 URL: https://issues.apache.org/jira/browse/SPARK-4188
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Aaron Davidson
>
> During periods of high network (or GC) load, it is not uncommon that IOExceptions crop up around connection failures when fetching shuffle files. Unfortunately, when such a failure occurs, it is interpreted as an inability to fetch the files, which causes us to mark the executor as lost and recompute all of its shuffle outputs.
> We should allow retrying at the network level in the event of an IOException in order to avoid this circumstance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org