You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2014/05/02 00:43:19 UTC

[jira] [Commented] (MAPREDUCE-5652) NM Recovery. ShuffleHandler should handle NM restarts

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987115#comment-13987115 ] 

Ming Ma commented on MAPREDUCE-5652:
------------------------------------

Sounds good, we can use a new jira to cover "best effort" work.

The patch looks good. Just to confirm, protobuf should be backward compatible, e.g., the store state serialized with version 2.4 should be readable by NM/MR compiled with version 2.5.

On an unrelated note, based on how NM's AuxServices' serviceStart handles error for each AuxService' serviceStart, if one AuxService throws some exception, the rest of AuxServices' serviceStart will be skipped. That isn't important given we only have one AuxService. Perhaps there is some policy around that as well, should NM skip failed AuxService? It seems in general we might need to improve AuxService handling if there are other AuxServices.

> NM Recovery. ShuffleHandler should handle NM restarts
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Jason Lowe
>              Labels: shuffle
>         Attachments: MAPREDUCE-5652-v2.patch, MAPREDUCE-5652-v3.patch, MAPREDUCE-5652-v4.patch, MAPREDUCE-5652-v5.patch, MAPREDUCE-5652-v6.patch, MAPREDUCE-5652-v7.patch, MAPREDUCE-5652-v8.patch, MAPREDUCE-5652-v9-and-YARN-1987.patch, MAPREDUCE-5652.patch
>
>
> ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.2#6252)