You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/04/30 19:30:00 UTC
[jira] [Updated] (HDDS-5170) Race condition in
NodestateManager#addNode allows datanodes with lower MLV to be used in
pipelines
[ https://issues.apache.org/jira/browse/HDDS-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDDS-5170:
---------------------------------
Labels: pull-request-available (was: )
> Race condition in NodestateManager#addNode allows datanodes with lower MLV to be used in pipelines
> --------------------------------------------------------------------------------------------------
>
> Key: HDDS-5170
> URL: https://issues.apache.org/jira/browse/HDDS-5170
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Ethan Rose
> Priority: Major
> Labels: pull-request-available
>
> HDDS-4946 Introduced a race condition in NodeStateManager#addNode that allows SCM's background pipeline creator or another thread to read a node with a lower MLV than SCM as healthy before it is moved to the healthy readonly state.
> {code:java}
> public void addNode(DatanodeDetails datanodeDetails,
> LayoutVersionProto layoutInfo) throws NodeAlreadyExistsException {
> NodeStatus newNodeStatus = newNodeStatus(datanodeDetails);
> nodeStateMap.addNode(datanodeDetails, newNodeStatus, layoutInfo);
> UUID dnID = datanodeDetails.getUuid();
> try {
> updateLastKnownLayoutVersion(datanodeDetails, layoutInfo);
> DatanodeInfo dnInfo = nodeStateMap.getNodeInfo(dnID);
> NodeStatus status = nodeStateMap.getNodeStatus(dnID);
> // State machine starts nodes as HEALTHY. If there is a layout
> // mismatch, this node should be moved to HEALTHY_READONLY.
> updateNodeLayoutVersionState(dnInfo, layoutMisMatchCondition, status,
> NodeLifeCycleEvent.LAYOUT_MISMATCH);
> } catch (NodeNotFoundException ex) {
> LOG.error("Inconsistent NodeStateMap! Datanode with ID {} was " +
> "added but not found in map: {}", dnID, nodeStateMap);
> }
> eventPublisher.fireEvent(SCMEvents.NEW_NODE, datanodeDetails);
> }
> {code}
> The node is added to the node state map (where other threads can view it) before its layout version information is updated.
> This manifests as an intermittent test failure in TestSCMNodeManager#testSCMLayoutOnRegister, which fails due to this condition after about 15-30 consecutive runs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org