You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Tarun Parimi (Jira)" <ji...@apache.org> on 2021/08/18 10:54:00 UTC

[jira] [Created] (YARN-10890) Node Attributes in Distributed mapping misses update to scheduler when node gets decommissioned/recommissioned

Tarun Parimi created YARN-10890:
-----------------------------------

             Summary: Node Attributes in Distributed mapping misses update to scheduler when node gets decommissioned/recommissioned
                 Key: YARN-10890
                 URL: https://issues.apache.org/jira/browse/YARN-10890
             Project: Hadoop YARN
          Issue Type: Bug
    Affects Versions: 3.2.1, 3.3.0
            Reporter: Tarun Parimi


The NodeAttributesManagerImpl maintains the node to attribute mapping. But it doesnt remove the mapping when a node goes down. This makes sense for centralized mapping, since the attribute mapping is centralized to RM, so a node going down doesn't affect the mapping.

In distributed mapping, the node attribute mapping is updated via NM heartbeat to RM and so these node attributes are only valid as long as the node is heartbeating . But when a node is decommissioned or lost, the node attribute entry still remains in  NodeAttributesManagerImpl.

After the performance improvement change done in YARN-8925, we only update distributed node attributes when necessary. However when a previously decommissioned node is recommissioned again, NodeAttributesManagerImpl still has the old mapping entry belonging to the old SchedulerNode instance which was decommisioned.

This results in ResourceTrackerService#updateNodeAttributesIfNecessary skipping the update, since it is comparing with the attributes belonging to the old decommisioned node instance.
{code:java}
	    if (!NodeLabelUtil
	        .isNodeAttributesEquals(nodeAttributes, currentNodeAttributes)) {
	      this.rmContext.getNodeAttributesManager()
	          .replaceNodeAttributes(NodeAttribute.PREFIX_DISTRIBUTED,
	              ImmutableMap.of(nodeId.getHost(), nodeAttributes));
	    } else if (LOG.isDebugEnabled()) {
	      LOG.debug("Skip updating node attributes since there is no change for "
	          + nodeId + " : " + nodeAttributes);
	    }
{code}

We should remove the distributed node attributes whenever a node gets deactivated to avoid this issue. So these attributes will get added properly in scheduler whenever the node becomes active again and registers/heartbeats.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org