You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/02/03 17:00:00 UTC
[jira] [Updated] (HDDS-2592) Add Datanode command to allow the datanode to persist its admin state

     [ https://issues.apache.org/jira/browse/HDDS-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HDDS-2592:
---------------------------------
    Labels: pull-request-available  (was: )

> Add Datanode command to allow the datanode to persist its admin state 
> ----------------------------------------------------------------------
>
>                 Key: HDDS-2592
>                 URL: https://issues.apache.org/jira/browse/HDDS-2592
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: Ozone Datanode, SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>
> When the operational state of a datanode changes, an async command should be triggered to persist the new state on the datanodes. For maintenance mode, the datanode should also store the maintenance end time. The datanode will then report the new state (and optional maintenance end time) back via its heartbeat.
> The purpose of the DN persisting this information and heartbeating it back to SCM is to allow the operation state to be recovered after a SCM reboot, as SCM does not persist any of this information. It also allows "Recon" to learn the datanode states.
> If SCM is restarted, then it will forget all knowledge of the datanodes. When they register, their operational state will be reported and SCM can set it correctly.
> Outside of registration (ie during normal heartbeats), the SCM state is the source of truth for the operational state and if the DN heartbeat reports a state that is not the same as SCM, SCM should issue another command to the datanode to set its state to the SCM value. There is a chance the state miss match is due to an unprocessed command triggered by the SCM state change, but the worst case is an extra command sent to the datanode. This is a very lightweight command, so that is not an issue.
> One open question is whether to persist intermediate states on the DN. Ie for decommissioning, the DN will first persist "Decommissioning" and then transition to "Decommissioned" when SCM is satisfied all containers are replicated. It would be possible to persist both these states in turn on the datanode quite easily in turn. Or, we set the end state (Decommissioned) on the datanode and allow SCM to get the node to that state. For the latter, if SCM is restarted, then the DN will report "Decommissioned" on registration, but SCM will set its internal state to Decommissioning and then ensure all containers are replicated before transitioning the node to Decommissioned. This seems like a safer approach, but there are advantages of tracking the intermediate states on the DNs too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org