You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org> on 2013/09/17 23:07:53 UTC

[jira] [Created] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real

Vinod Kumar Vavilapalli created YARN-1210:
---------------------------------------------

             Summary: During RM restart, RM should start a new attempt only when previous attempt exits for real
                 Key: YARN-1210
                 URL: https://issues.apache.org/jira/browse/YARN-1210
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Vinod Kumar Vavilapalli
            Assignee: Vinod Kumar Vavilapalli


When RM recovers, it can wait for existing AMs to contact RM back and then kill them forcefully before even starting a new AM. Worst case, RM will start a new AppAttempt after waiting for 10 mins ( the expiry interval). This way we'll minimize multiple AMs racing with each other. This can help issues with downstream components like Pig, Hive and Oozie during RM restart.

In the mean while, new apps will proceed as usual as existing apps wait for recovery.

This can continue to be useful after work-preserving restart, so that AMs which can properly sync back up with RM can continue to run and those that don't are guaranteed to be killed before starting a new attempt.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira