You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/01/09 15:04:00 UTC

[jira] [Commented] (MAPREDUCE-5652) ShuffleHandler should handle NM restarts

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866655#comment-13866655 ] 

Jason Lowe commented on MAPREDUCE-5652:
---------------------------------------

I've largely implemented this as part of the prototype for YARN-1336.  I actually have two versions, one that uses FileSystem to store the shuffle tokens and job-to-user mappings and another that uses leveldb.  (The prototype currently has a  leveldb back-end store to simplify some of the race conditions during store and recovery.)  It shouldn't be too much effort to extricate just the ShuffleHandler changes, although there aren't any unit tests for it yet.

As Alejandro pointed out it also needs some help from the NodeManager to keep it from cleaning up the local directories and removing the shuffle output after restarting.  That's also been done as part of the prototype and is relatively straightforward, but we're still missing a mechanism for distinguishing the restart case vs. shutdown/decommission case and some other cleanup.

> ShuffleHandler should handle NM restarts
> ----------------------------------------
>
>                 Key: MAPREDUCE-5652
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5652
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>              Labels: shuffle
>
> ShuffleHandler should work across NM restarts and not require re-running map-tasks. On NM restart, the map outputs are cleaned up requiring re-execution of map tasks and should be avoided.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)