You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "YCozy (Jira)" <ji...@apache.org> on 2020/05/29 19:30:00 UTC

[jira] [Commented] (YARN-10166) Add detail log for ApplicationAttemptNotFoundException

    [ https://issues.apache.org/jira/browse/YARN-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119886#comment-17119886 ] 

YCozy commented on YARN-10166:
------------------------------

We encountered the same issue. An AM is killed during NM failover, but the AM still manages to send the allocate() heartbeat to RM after the AM is unregistered and before the AM is totally gone. As a result, the confusing ERROR entry "Application attempt ... doesn't exist" occurs in RM's log. Logging more information about the app would be a great way to clear the confusion.

 

Btw, why do we want this to be an ERROR for the RM?

> Add detail log for ApplicationAttemptNotFoundException
> ------------------------------------------------------
>
>                 Key: YARN-10166
>                 URL: https://issues.apache.org/jira/browse/YARN-10166
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Youquan Lin
>            Priority: Minor
>              Labels: patch
>         Attachments: YARN-10166-001.patch, YARN-10166-002.patch, YARN-10166-003.patch, YARN-10166-004.patch
>
>
>      Suppose user A killed the app, then ApplicationMasterService will  call unregisterAttempt() for this app. Sometimes, app's AM continues to call the alloate() method and reports an error as follows.
> {code:java}
> Application attempt appattempt_1582520281010_15271_000001 doesn't exist in ApplicationMasterService cache.
> {code}
>     If user B has been watching the AM log, he will be confused why the attempt is no longer in the ApplicationMasterService cache. So I think we can add detail log for ApplicationAttemptNotFoundException as follows.
> {code:java}
> Application attempt appattempt_1582630210671_14658_000001 doesn't exist in ApplicationMasterService cache.App state: KILLED,finalStatus: KILLED ,diagnostics: App application_1582630210671_14658 killed by userA from 127.0.0.1
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org