You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Aravindan Vijayan (Jira)" <ji...@apache.org> on 2021/05/03 17:45:00 UTC

[jira] [Resolved] (HDDS-5170) Race condition in NodestateManager#addNode allows datanodes with lower MLV to be used in pipelines

     [ https://issues.apache.org/jira/browse/HDDS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aravindan Vijayan resolved HDDS-5170.
-------------------------------------
    Resolution: Fixed

PR merged.

> Race condition in NodestateManager#addNode allows datanodes with lower MLV to be used in pipelines
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-5170
>                 URL: https://issues.apache.org/jira/browse/HDDS-5170
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Assignee: Ethan Rose
>            Priority: Major
>              Labels: pull-request-available
>
> HDDS-4946 Introduced a race condition in NodeStateManager#addNode that allows SCM's background pipeline creator or another thread to read a node with a lower MLV than SCM as healthy before it is moved to the healthy readonly state.
> {code:java}
>   public void addNode(DatanodeDetails datanodeDetails,
>       LayoutVersionProto layoutInfo) throws NodeAlreadyExistsException {
>     NodeStatus newNodeStatus = newNodeStatus(datanodeDetails);
>     nodeStateMap.addNode(datanodeDetails, newNodeStatus, layoutInfo);
>     UUID dnID = datanodeDetails.getUuid();
>     try {
>       updateLastKnownLayoutVersion(datanodeDetails, layoutInfo);
>       DatanodeInfo dnInfo = nodeStateMap.getNodeInfo(dnID);
>       NodeStatus status = nodeStateMap.getNodeStatus(dnID);
>       // State machine starts nodes as HEALTHY. If there is a layout
>       // mismatch, this node should be moved to HEALTHY_READONLY.
>       updateNodeLayoutVersionState(dnInfo, layoutMisMatchCondition, status,
>           NodeLifeCycleEvent.LAYOUT_MISMATCH);
>     } catch (NodeNotFoundException ex) {
>       LOG.error("Inconsistent NodeStateMap! Datanode with ID {} was " +
>           "added but not found in  map: {}", dnID, nodeStateMap);
>     }
>     eventPublisher.fireEvent(SCMEvents.NEW_NODE, datanodeDetails);
>   }
> {code}
> The node is added to the node state map (where other threads can view it) before its layout version information is updated.
> This manifests as an intermittent test failure in TestSCMNodeManager#testSCMLayoutOnRegister, which fails due to this condition after about 15-30 consecutive runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org