You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Hong Zhiguo (JIRA)" <ji...@apache.org> on 2015/09/02 03:45:45 UTC

[jira] [Updated] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

     [ https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hong Zhiguo updated YARN-4104:
------------------------------
    Description: 
We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? "

It's really hard to answer such questions.

So we added a diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator().  All scheduling parameters of the children are also displayed, such as minShare, usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/root", and the result self-explains to the questions.
I feel it's really useful for multi-tenant clusters, and hope it could be merged into the mainline.

  was:
We have more than 1 thousand queues and several handreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? "

It's realy hard to answer such questions.

So we added an diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator().  All scheduling parameters of the chidren are also displayed, such as minShare, usage, demand, weight, priority etc.
Usually we just call "/ws/v1/cluster/schedule/root", and the result self-explains to the questions.
I feel it's really usefull for multi-tenant clusters, and hope it could be merged into the mainline.


> dryrun of schedule for diagnostic and tenant's complain
> -------------------------------------------------------
>
>                 Key: YARN-4104
>                 URL: https://issues.apache.org/jira/browse/YARN-4104
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: scheduler
>            Reporter: Hong Zhiguo
>            Assignee: Hong Zhiguo
>            Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy cluster. We get a lot of complains/questions from owner/operator of queues about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted list of it's children according to it's SchedulingPolicy.getComparator().  All scheduling parameters of the children are also displayed, such as minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/root", and the result self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be merged into the mainline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)