You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Qi Zhu (Jira)" <ji...@apache.org> on 2021/03/22 14:51:00 UTC
[jira] [Comment Edited] (YARN-10517) QueueMetrics has incorrect
Allocated Resource when labelled partitions updated
[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305469#comment-17305469 ]
Qi Zhu edited comment on YARN-10517 at 3/22/21, 2:50 PM:
---------------------------------------------------------
I meet the problem too.
I fixed it and add corresponding test in YARN-10517.001.patch.
[~epayne] [~pbacsko] [~gandras] [~ebadger] [~jianliang.wu] Could you help review this?
Thanks.:D
was (Author: zhuqi):
I meet the problem too.
I fixed it and add corresponding test in YARN-10517.001.patch.
[~epayne] [~pbacsko] [~gandras] [~ebadger] Could you help review this?
Thanks.:D
> QueueMetrics has incorrect Allocated Resource when labelled partitions updated
> ------------------------------------------------------------------------------
>
> Key: YARN-10517
> URL: https://issues.apache.org/jira/browse/YARN-10517
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.8.0, 3.3.0
> Reporter: sibyl.lv
> Assignee: Qi Zhu
> Priority: Major
> Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, wrong metrics.png
>
>
> After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has incorrect allocated jmx, such as {color:#660e7a}allocatedMB, {color}{color:#660e7a}allocatedVCores and {color}{color:#660e7a}allocatedContainers, {color}when the node partition is updated from "DEFAULT" to other label and there are running applications.
> Steps to reproduce
> ==============
> # Configure capacity-scheduler.xml with label configuration
> # Submit one application to default partition and run
> # Add label "tpcds" to cluster and replace label on node1 and node2 to be "tpcds" when the above application is running
> # Note down "VCores Used" at Web UI
> # When the application is finished, the metrics get wrong (screenshots attached).
> ==============
>
> FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles this event {color:#660e7a}NODE_LABELS_UPDATE.{color}
> So we should release container resource from old partition and add used resource to new partition, just as updating queueUsage.
> {code:java}
> // code placeholder
> public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition,
> String newPartition) {
> Resource containerResource = rmContainer.getAllocatedResource();
> this.attemptResourceUsage.decUsed(oldPartition, containerResource);
> this.attemptResourceUsage.incUsed(newPartition, containerResource);
> getCSLeafQueue().decUsedResource(oldPartition, containerResource, this);
> getCSLeafQueue().incUsedResource(newPartition, containerResource, this);
> // Update new partition name if container is AM and also update AM resource
> if (rmContainer.isAMContainer()) {
> setAppAMNodePartitionName(newPartition);
> this.attemptResourceUsage.decAMUsed(oldPartition, containerResource);
> this.attemptResourceUsage.incAMUsed(newPartition, containerResource);
> getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this);
> getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this);
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org