You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Naganarasimha G R (JIRA)" <ji...@apache.org> on 2015/11/04 00:00:29 UTC

[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state.

     [ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Naganarasimha G R updated YARN-3946:
------------------------------------
    Attachment: YARN-3946.v1.001.patch
                YARN3946_attemptDiagnistic message.png

Hi [~wangda],[~rohithsharma],[~sunilg], [~sumit.nigam] & [~nijel].

As mentioned by Wangda in his [comment|https://issues.apache.org/jira/browse/YARN-4091?focusedCommentId=14735266&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14735266] in YARN-4091, its very difficult to capture to the status when *App's leafqueue or parent queue beyond its limit* as it would not be good to loop through all the apps in the hierarchy and update the status for each node update and also it will loose its imp info from previous updates.

So i think valid cases where we can update AMLaunchDiagnostics in SchedulerApplicationAttempt as (ForCS) :
 
* App is in Pending state, AMLimit/userlimit of the queue
* App waiting for resources of partition for AM to be launched (once moved from pending state)
* App waiting for resources of partition for AM to be launched Some nodes are blacklisted (if it fails to launch because of some black list nodes)
* AMLimit of the queue doesnt allow to launch 
* UserLimit of the queue doesnt allow to launch

Please check if the approach is proper, if its usefull and required then can get similar thing done for FairScheduler also. cc/ [~kasha@cloudera.com]

Also have taken the liberty to modify some small issues in {{SchedulerApplicationAttempt.isWaitingForAMContainer}} in the same patch if required can raise another jira and put these small changes there.


> Allow fetching exact reason as to why a submitted app is in ACCEPTED state.
> ---------------------------------------------------------------------------
>
>                 Key: YARN-3946
>                 URL: https://issues.apache.org/jira/browse/YARN-3946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Sumit Nigam
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3946.v1.001.patch, YARN3946_attemptDiagnistic message.png
>
>
> Currently there is no direct way to get the exact reason as to why a submitted app is still in ACCEPTED state. It should be possible to know through RM REST API as to what aspect is not being met - say, queue limits being reached, or core/ memory requirement not being met, or AM limit being reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)