You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Brandon Scheller (JIRA)" <ji...@apache.org> on 2018/12/06 21:18:00 UTC

[jira] [Updated] (YARN-9088) Non-exclusive labels break QueueMetrics

     [ https://issues.apache.org/jira/browse/YARN-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Scheller updated YARN-9088:
-----------------------------------
    Description: 
QueueMetrics are broken (random/negative values) when non-exclusive labels are being used and unlabeled containers run on labeled nodes.

This is caused by the change in the patch here:

https://issues.apache.org/jira/browse/YARN-6467

It assumes that a container's label will be the same as the node's label that it is running on.

If you look within the patch, sometimes metrics are updated using the request.getNodeLabelExpression(). And sometimes they are updated using node.getPartition().

This means that in the case where the node is labeled while the container request isn't, these metrics only get updated when referring to the default queue. This stops metrics from balancing out and results in incorrect and negative values in QueueMetrics. 

  was:
QueueMetrics are broken (random/negative values) when non-exclusive labels are being used and unlabeled containers run on labeled nodes.

This is caused by the change in the patch here:

https://issues.apache.org/jira/browse/YARN-6467

It assumes that a container's label will be the same as the node's label that it is running on.

If you look within the patch, sometimes metrics are updated using the request.getNodeLabelExpression(). And sometimes they are updated using node.getPartition().

This means that in the case where the node is labeled while the request isn't, these metrics only get updated when referring to the default queue. This stops metrics from balancing out and results in incorrect and negative values in QueueMetrics. 


> Non-exclusive labels break QueueMetrics
> ---------------------------------------
>
>                 Key: YARN-9088
>                 URL: https://issues.apache.org/jira/browse/YARN-9088
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.8.5
>            Reporter: Brandon Scheller
>            Priority: Major
>              Labels: metrics, nodelabel
>
> QueueMetrics are broken (random/negative values) when non-exclusive labels are being used and unlabeled containers run on labeled nodes.
> This is caused by the change in the patch here:
> https://issues.apache.org/jira/browse/YARN-6467
> It assumes that a container's label will be the same as the node's label that it is running on.
> If you look within the patch, sometimes metrics are updated using the request.getNodeLabelExpression(). And sometimes they are updated using node.getPartition().
> This means that in the case where the node is labeled while the container request isn't, these metrics only get updated when referring to the default queue. This stops metrics from balancing out and results in incorrect and negative values in QueueMetrics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org