You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2021/04/29 21:40:00 UTC
[jira] [Created] (HDDS-5170) Race condition in
NodestateManager#addNode allows datanodes with lower MLV to be used in
pipelines
Ethan Rose created HDDS-5170:
--------------------------------
Summary: Race condition in NodestateManager#addNode allows datanodes with lower MLV to be used in pipelines
Key: HDDS-5170
URL: https://issues.apache.org/jira/browse/HDDS-5170
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Ethan Rose
Assignee: Ethan Rose
HDDS-4946 Introduced a race condition in NodeStateManager#addNode that allows SCM's background pipeline creator or another thread to read a node with a lower MLV than SCM as healthy before it is moved to the healthy readonly state.
{code:java}
public void addNode(DatanodeDetails datanodeDetails,
LayoutVersionProto layoutInfo) throws NodeAlreadyExistsException {
NodeStatus newNodeStatus = newNodeStatus(datanodeDetails);
nodeStateMap.addNode(datanodeDetails, newNodeStatus, layoutInfo);
UUID dnID = datanodeDetails.getUuid();
try {
updateLastKnownLayoutVersion(datanodeDetails, layoutInfo);
DatanodeInfo dnInfo = nodeStateMap.getNodeInfo(dnID);
NodeStatus status = nodeStateMap.getNodeStatus(dnID);
// State machine starts nodes as HEALTHY. If there is a layout
// mismatch, this node should be moved to HEALTHY_READONLY.
updateNodeLayoutVersionState(dnInfo, layoutMisMatchCondition, status,
NodeLifeCycleEvent.LAYOUT_MISMATCH);
} catch (NodeNotFoundException ex) {
LOG.error("Inconsistent NodeStateMap! Datanode with ID {} was " +
"added but not found in map: {}", dnID, nodeStateMap);
}
eventPublisher.fireEvent(SCMEvents.NEW_NODE, datanodeDetails);
}
{code}
The node is added to the node state map (where other threads can view it) before its layout version information is updated.
This manifests as an intermittent test failure in TestSCMNodeManager#testSCMLayoutOnRegister, which fails due to this condition after about 15-30 consecutive runs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org