You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mukul Kumar Singh (Jira)" <ji...@apache.org> on 2021/06/09 08:26:00 UTC

[jira] [Updated] (HDDS-4766) Recon resets the Operational State of datanodes to IN_SERVICE

     [ https://issues.apache.org/jira/browse/HDDS-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mukul Kumar Singh updated HDDS-4766:
------------------------------------
    Reporter: Nilotpal Nandi  (was: Stephen O'Donnell)

> Recon resets the Operational State of datanodes to IN_SERVICE
> -------------------------------------------------------------
>
>                 Key: HDDS-4766
>                 URL: https://issues.apache.org/jira/browse/HDDS-4766
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Recon
>    Affects Versions: 1.1.0
>            Reporter: Nilotpal Nandi
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.1.0
>
>
> When a datanode is decommission or put to maintenance, its new state is persisted into the datanode.yaml file. When running on a cluster with Recon enabled, we can see conflicting commands are received repeatedly on the Datanode, eg:
> {code}
> datanode_3  | 2021-01-29 16:26:20,009 [EndpointStateMachine task thread for scm/172.24.0.6:9861 - 0 ] INFO endpoint.HeartbeatEndpointTask: Received SCM set operational state command. State: DECOMMISSIONED Expiry: 0 id 3645344
> datanode_3  | 2021-01-29 16:26:50,012 [EndpointStateMachine task thread for recon/172.24.0.3:9891 - 0 ] INFO commands.SetNodeOperationalStateCommand: Create a new command to set op state IN_SERVICE 0 id is 3675347
> {code}
> This is happening because Recon delegates processing the DN heartbeats received by ReconNodeManager to an instance of SCMNodeManager running inside Recon. SCMNodeManager checks the reported state of the datanode matches the SCM memory state, and if they don't match, it issues a command to the DN to update its state.
> In this case, Recon always tries to set the DN state back to IN_SERVICE.
> The fix here, is probably to update the Recon in memory state before delegating the heartbeat to SCMNodeManager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org