You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2014/09/02 18:00:23 UTC

[jira] [Commented] (MAPREDUCE-5891) Improved shuffle error handling across NM restarts

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118290#comment-14118290 ] 

Ming Ma commented on MAPREDUCE-5891:
------------------------------------

Thanks, Junping, Jason for the useful patch.

In the case slowstart is set to some small value, the reducer will fetch some mapper output and wait for the rest. Is it possible Fetcher.retryStartTime is set to some old value due to early NM host A restart, and thus mark fetcher retry timed out when it later tries to handle NM host B restart?

To make sure fetcher doesn't unnecessarily retry for the decommission scenario, it seems the assumption is we will have some sort of graceful decommission support so that during decommission process the fetcher will still be able to get mapper output. Is it true?

If we get time to do YARN-1593, that will further reduce the chance of shuffle handler restart. Any opinion on that?

> Improved shuffle error handling across NM restarts
> --------------------------------------------------
>
>                 Key: MAPREDUCE-5891
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5891
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.5.0
>            Reporter: Jason Lowe
>            Assignee: Junping Du
>         Attachments: MAPREDUCE-5891-demo.patch, MAPREDUCE-5891-v2.patch, MAPREDUCE-5891-v3.patch, MAPREDUCE-5891.patch
>
>
> To minimize the number of map fetch failures reported by reducers across an NM restart it would be nice if reducers only reported a fetch failure after trying for at specified period of time to retrieve the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)