You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Tarun Parimi (Jira)" <ji...@apache.org> on 2021/08/18 10:54:00 UTC

[jira] [Assigned] (YARN-10890) Node Attributes in Distributed mapping misses update to scheduler when node gets decommissioned/recommissioned

     [ https://issues.apache.org/jira/browse/YARN-10890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tarun Parimi reassigned YARN-10890:
-----------------------------------

    Assignee: Tarun Parimi

> Node Attributes in Distributed mapping misses update to scheduler when node gets decommissioned/recommissioned
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10890
>                 URL: https://issues.apache.org/jira/browse/YARN-10890
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.3.0, 3.2.1
>            Reporter: Tarun Parimi
>            Assignee: Tarun Parimi
>            Priority: Major
>
> The NodeAttributesManagerImpl maintains the node to attribute mapping. But it doesnt remove the mapping when a node goes down. This makes sense for centralized mapping, since the attribute mapping is centralized to RM, so a node going down doesn't affect the mapping.
> In distributed mapping, the node attribute mapping is updated via NM heartbeat to RM and so these node attributes are only valid as long as the node is heartbeating . But when a node is decommissioned or lost, the node attribute entry still remains in  NodeAttributesManagerImpl.
> After the performance improvement change done in YARN-8925, we only update distributed node attributes when necessary. However when a previously decommissioned node is recommissioned again, NodeAttributesManagerImpl still has the old mapping entry belonging to the old SchedulerNode instance which was decommisioned.
> This results in ResourceTrackerService#updateNodeAttributesIfNecessary skipping the update, since it is comparing with the attributes belonging to the old decommisioned node instance.
> {code:java}
> 	    if (!NodeLabelUtil
> 	        .isNodeAttributesEquals(nodeAttributes, currentNodeAttributes)) {
> 	      this.rmContext.getNodeAttributesManager()
> 	          .replaceNodeAttributes(NodeAttribute.PREFIX_DISTRIBUTED,
> 	              ImmutableMap.of(nodeId.getHost(), nodeAttributes));
> 	    } else if (LOG.isDebugEnabled()) {
> 	      LOG.debug("Skip updating node attributes since there is no change for "
> 	          + nodeId + " : " + nodeAttributes);
> 	    }
> {code}
> We should remove the distributed node attributes whenever a node gets deactivated to avoid this issue. So these attributes will get added properly in scheduler whenever the node becomes active again and registers/heartbeats.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org