You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jian He (JIRA)" <ji...@apache.org> on 2015/02/02 20:23:36 UTC

[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery

    [ https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301737#comment-14301737 ] 

Jian He commented on YARN-3094:
-------------------------------

thanks [~hex108] for the patch and thanks [~adhoot] for reviewing the patch !

one comment on my side: 
{code}
    Thread.sleep(1000); // make sure that monitor has been working
    Assert.assertEquals(Service.STATE.STARTED, monitor.getServiceState());
{code}
Instead of hard sleep, we can wait for the monitor state to be started


> reset timer for liveness monitors after RM recovery
> ---------------------------------------------------
>
>                 Key: YARN-3094
>                 URL: https://issues.apache.org/jira/browse/YARN-3094
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-3094.2.patch, YARN-3094.3.patch, YARN-3094.patch
>
>
> When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps). 
> In our system, we found the recover process took about 3 mins, and all AM time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)