You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2012/08/06 08:50:04 UTC

[jira] [Commented] (MAPREDUCE-4326) Resurrect RM Restart

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428995#comment-13428995 ] 

Bikas Saha commented on MAPREDUCE-4326:
---------------------------------------

I think the current implementation (actual code/commented code/todo's etc) looks like a prototype which may not be in sync with the current state of the functional code. So I am not sure about using it as is. 
Also, the implementation seems to be doing blocking calls to ZK etc and will likely end up being a bottleneck on RM threads/perf if a lot of state information needs to be synced to stable store.
On that note, my gut feeling is that the RM state in practice is, in a sense, the sum total of the current state of the cluster as reflected in the NM's. So there may not be the need to store any state as long as the RM can recover the current state of the cluster from the NM's in a reasonable amount of time. The NM's anyways have to re-sync with the RM after it comes back up. So that is not extra overhead.
Saving a lot of state would result in having to solve the same set of issues that the Namenode has to solve in order to maintain consistent, reliable and available saved state. IMO, for the RM we are better off avoiding those issues.
The only state that needs to be save, as far as I can see, is the information about all jobs that are not yet completed. This information is present only in the RM and so needs to be preserved across RM restart. Fortunately, this information is small and infrequently updated. So saving it synchronously in ZK may not be too much of an issue.
                
> Resurrect RM Restart 
> ---------------------
>
>                 Key: MAPREDUCE-4326
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4326
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, resourcemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Arun C Murthy
>            Assignee: Bikas Saha
>         Attachments: MR-4343.1.patch
>
>
> We should resurrect 'RM Restart' which we disabled sometime during the RM refactor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira