You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2019/12/10 12:35:00 UTC

[jira] [Commented] (HDDS-2607) DeadNodeHandler should not remove replica for a dead maintenance node

    [ https://issues.apache.org/jira/browse/HDDS-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992504#comment-16992504 ] 

Stephen O'Donnell commented on HDDS-2607:
-----------------------------------------

The staleNodeHandler logic will do the following for a stale node:

1. Close the containers
2. Close the pipelines

The dead node handler will do the same as the above, plus:

3. Remove the containerReplica's from the container manager.

As 1 and 2 will already be completed for a node decommissioning or entering maintenance, it will do no harm to repeat it for any node admin state. For 3, we only want to avoid it if the node is IN_MAINTENANCE, so that check can easily be performed in the deadNodeHandler.

This means we can trigger the HEALTH events only on the node in the nodeStateManager and not be concerned with the operational state.

Similarly, the NonHealthyToHealthyNodeHandler triggers pipeline creation, but the creation logic will avoid nodes which are in_service, so it will do no harm to allow a NonHealthyToHealthyNode event to be triggered for any nodes in maintenance states.

> DeadNodeHandler should not remove replica for a dead maintenance node
> ---------------------------------------------------------------------
>
>                 Key: HDDS-2607
>                 URL: https://issues.apache.org/jira/browse/HDDS-2607
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>
> Normally, when a node goes dead, the DeadNodeHandler removes all the containers and replica associated with the node from the ContainerManager.
> If a node is IN_MAINTENANCE and goes dead, then we do not want to remove its replica. They should remain present in the system to prevent the container being marked as under-replicated.
> We also need to consider the case where the node is dead, and then maintenance expires automatically. In that case, the replica associated with the node must be removed and the affected containers will become under-replicated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org