You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Chandni Singh (JIRA)" <ji...@apache.org> on 2018/03/08 00:18:00 UTC

[jira] [Comment Edited] (YARN-5015) Support sliding window retry capability for container restart

    [ https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390490#comment-16390490 ] 

Chandni Singh edited comment on YARN-5015 at 3/8/18 12:17 AM:
--------------------------------------------------------------

 [~leftnoteasy] Please find my answers below to some of the questions:
{quote}2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to org.apache.hadoop.yarn.server.nodemanager.containermanager.container: Why it is in server-common?
{quote}
It is in server common so that later we can use it for AM restart. Eventually we have to unify the code for AM and container restart, so this class needs to be accessible to RM as well.
{quote}4) calculatePendingRetries

return retryContext.getRemainingRetries() == -1 ? retryContext.getMaxRetries() : retryContext.getRemainingRetries();

 Why check {{retryContext.getRemainingRetries() == -1}}? Should this be getMaxRetries() == -1?
{quote}
The default value of {{remainingRetries}} is -1, that is, when it is not set, it is -1.

If remainingRetries is not set then pending retries = {{maxRetries}}. Otherwise, pendingRetries = {{remainingRetries}}.
 Just after this we update the {{remainingRetries}} = {{pendingRetries}} - 1.
{quote}1) Instead of adding getRestartTimes/getRemainingRetries to {{ContainerRetryContext}}, I suggest to have a separate class like NMContainerRetryContext which includes:
{quote}
Similar to 2, should I create a {{SlidingContainerRetryContext}} in the server-common? Even this needs to be accessible to RM later when we change AM retry code to use this common class?

 

 


was (Author: csingh):
 [~leftnoteasy] Please find my answers below to some of the questions:
{quote}2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to org.apache.hadoop.yarn.server.nodemanager.containermanager.container: Why it is in server-common?
{quote}
It is in server common so that later we can use it for AM restart. Eventually we have to unify the code for AM and container restart, so this class needs to be accessible to RM as well.
{quote}4) calculatePendingRetries

return retryContext.getRemainingRetries() == -1 ? retryContext.getMaxRetries() : retryContext.getRemainingRetries();

 Why check {{retryContext.getRemainingRetries() == -1}}? Should this be getMaxRetries() == -1?
{quote}
The default value of {{remainingRetries}} is -1, that is, when it is not set, it is -1.

If remainingRetries is not set then pending retries = {{maxRetries}}. Otherwise, pendingRetries = {{remainingRetries}}.
 Just after this we update the {{remainingRetries}} = {{pendingRetries}} - 1.

> Support sliding window retry capability for container restart 
> --------------------------------------------------------------
>
>                 Key: YARN-5015
>                 URL: https://issues.apache.org/jira/browse/YARN-5015
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: oct16-medium
>         Attachments: YARN-5015.01.patch, YARN-5015.02.patch, YARN-5015.03.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in YARN-611). Similar sliding window retry policy is needed for container restarts.
> With this change, we can introduce a common class for SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use SlidingWindowRetryPolicy which will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org