You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Tarun Parimi (Jira)" <ji...@apache.org> on 2021/08/18 10:54:00 UTC
[jira] [Created] (YARN-10890) Node Attributes in Distributed
mapping misses update to scheduler when node gets
decommissioned/recommissioned
Tarun Parimi created YARN-10890:
-----------------------------------
Summary: Node Attributes in Distributed mapping misses update to scheduler when node gets decommissioned/recommissioned
Key: YARN-10890
URL: https://issues.apache.org/jira/browse/YARN-10890
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.2.1, 3.3.0
Reporter: Tarun Parimi
The NodeAttributesManagerImpl maintains the node to attribute mapping. But it doesnt remove the mapping when a node goes down. This makes sense for centralized mapping, since the attribute mapping is centralized to RM, so a node going down doesn't affect the mapping.
In distributed mapping, the node attribute mapping is updated via NM heartbeat to RM and so these node attributes are only valid as long as the node is heartbeating . But when a node is decommissioned or lost, the node attribute entry still remains in NodeAttributesManagerImpl.
After the performance improvement change done in YARN-8925, we only update distributed node attributes when necessary. However when a previously decommissioned node is recommissioned again, NodeAttributesManagerImpl still has the old mapping entry belonging to the old SchedulerNode instance which was decommisioned.
This results in ResourceTrackerService#updateNodeAttributesIfNecessary skipping the update, since it is comparing with the attributes belonging to the old decommisioned node instance.
{code:java}
if (!NodeLabelUtil
.isNodeAttributesEquals(nodeAttributes, currentNodeAttributes)) {
this.rmContext.getNodeAttributesManager()
.replaceNodeAttributes(NodeAttribute.PREFIX_DISTRIBUTED,
ImmutableMap.of(nodeId.getHost(), nodeAttributes));
} else if (LOG.isDebugEnabled()) {
LOG.debug("Skip updating node attributes since there is no change for "
+ nodeId + " : " + nodeAttributes);
}
{code}
We should remove the distributed node attributes whenever a node gets deactivated to avoid this issue. So these attributes will get added properly in scheduler whenever the node becomes active again and registers/heartbeats.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org