You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2021/04/29 21:40:00 UTC

[jira] [Created] (HDDS-5170) Race condition in NodestateManager#addNode allows datanodes with lower MLV to be used in pipelines

Ethan Rose created HDDS-5170:
--------------------------------

             Summary: Race condition in NodestateManager#addNode allows datanodes with lower MLV to be used in pipelines
                 Key: HDDS-5170
                 URL: https://issues.apache.org/jira/browse/HDDS-5170
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Ethan Rose
            Assignee: Ethan Rose


HDDS-4946 Introduced a race condition in NodeStateManager#addNode that allows SCM's background pipeline creator or another thread to read a node with a lower MLV than SCM as healthy before it is moved to the healthy readonly state.

{code:java}

  public void addNode(DatanodeDetails datanodeDetails,
      LayoutVersionProto layoutInfo) throws NodeAlreadyExistsException {
    NodeStatus newNodeStatus = newNodeStatus(datanodeDetails);
    nodeStateMap.addNode(datanodeDetails, newNodeStatus, layoutInfo);
    UUID dnID = datanodeDetails.getUuid();
    try {
      updateLastKnownLayoutVersion(datanodeDetails, layoutInfo);
      DatanodeInfo dnInfo = nodeStateMap.getNodeInfo(dnID);
      NodeStatus status = nodeStateMap.getNodeStatus(dnID);

      // State machine starts nodes as HEALTHY. If there is a layout
      // mismatch, this node should be moved to HEALTHY_READONLY.
      updateNodeLayoutVersionState(dnInfo, layoutMisMatchCondition, status,
          NodeLifeCycleEvent.LAYOUT_MISMATCH);
    } catch (NodeNotFoundException ex) {
      LOG.error("Inconsistent NodeStateMap! Datanode with ID {} was " +
          "added but not found in  map: {}", dnID, nodeStateMap);
    }
    eventPublisher.fireEvent(SCMEvents.NEW_NODE, datanodeDetails);
  }

{code}

The node is added to the node state map (where other threads can view it) before its layout version information is updated.

This manifests as an intermittent test failure in TestSCMNodeManager#testSCMLayoutOnRegister, which fails due to this condition after about 15-30 consecutive runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org