You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Eric Payne (JIRA)" <ji...@apache.org> on 2016/08/23 19:59:20 UTC

[jira] [Updated] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

     [ https://issues.apache.org/jira/browse/YARN-5555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Payne updated YARN-5555:
-----------------------------
    Attachment: PctOfQueueIsInaccurate.jpg

The queue structure for the attached screenshot (PctOfQueueIsInnaccurate.jpg) has the following attributes:
||Cluster Capacity||root.swords.capacity||root.swords.brisingr.capacity||
|12288 MB|20%|25%|

There are 3 apps running in the {{root.swords.brisingr}} queue. The attributes for each of these apps are as follows:
||App Name||Allocated Memory MB||% of Queue||
|application_1471969002932_0001|4608 MB|150.0|
|application_1471969002932_0002|4608 MB|150.0|
|application_1471969002932_0003|3072 MB|100.0|

The value to the right of the {{Queue: swords.brisingr}} bar graph says that the queue is 2001.3% used. This value is (almost) accurate because the actual memory allocation allotted to {{root.swords.brisingr}} is {{12288 MB * 20% * 25% = 614.4 MB}}. Since {{root.swords.brisingr}} is consuming all 12288 MB, {{12288 MB / 614.4 MB = 20 * 100% = 2000%}}

However, the sum of the {{% of Queue}} column for all apps running in {{root.swords.brisingr}} is {{100.0% + 150.0% + 150.0% = 400%}}. This is inaccurate.

It appears as if the calculations are not taking into account the capacity of the parent queue, {{root.swords: 20%}}. For example,{{application_1471969002932_0001}}'s usage is 4608 MB, and {{12288 MB * 25% = 3072 MB}}, and {{4608 / 3072 = 1.5 * 100% = 150%}}. This calculation should have been {{4608 / 614.4 = 7.5 * 100% = 750%}}.

{{RMAppsBlock#renderData}} is calling {{ApplicationResourceUsageReport}}, which eventually calls {{SchedulerApplicationAttempt#getResourceUsageReport}}.
The following code in {{getResourceUsageReport}}, I think, needs to walk back up the parent tree to get all of the capacity values, not just the one for the leaf queue:
{code}
      queueUsagePerc =
          calc.divide(cluster, usedResourceClone, Resources.multiply(cluster,
              queue.getQueueInfo(false, false).getCapacity())) * 100;
{code}

> Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.
> --------------------------------------------------------------------------------
>
>                 Key: YARN-5555
>                 URL: https://issues.apache.org/jira/browse/YARN-5555
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>            Priority: Minor
>         Attachments: PctOfQueueIsInaccurate.jpg
>
>
> If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, {{root.a.a2}}), the values in the "*% of Queue*" column in the apps section of the Scheduler UI is calculated as if the leaf queue ({{a1}}) were a direct child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org