You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Siyao Meng (Jira)" <ji...@apache.org> on 2021/02/18 21:47:00 UTC

[jira] [Updated] (HDDS-4843) SCM can incorrectly marks Datanode as DECOMMISSIONING when Datanode is not fully initialized

     [ https://issues.apache.org/jira/browse/HDDS-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siyao Meng updated HDDS-4843:
-----------------------------
    Description: 
Tested in Docker once, if I run the {{ozone admin datanode decommission 172.18.0.5}} too early. The Datanode doesn't actually seem to be entering the DECOMMISSIONING state but the SCM registers the action. So far it has been over 10 minutes (fresh empty DN with docker-compose) and {{ozone admin datanode list}} still reports DECOMMISSIONING on that datanode I triggered decommissioning earlier. Maybe the DN ignored or didn't receive the command in its early startup stage, while SCM is waiting indefinitely(not sure if there's a limit)?

{code}
bash-4.2$ ozone admin datanode list
Datanode: bf5e0f92-5012-4975-8018-c39bf50ef592 (/default-rack/172.18.0.4/ozone_datanode_4.ozone_default/0 pipelines)
Operational State: DECOMMISSIONING
Related pipelines:
No related pipelines or the node is not in Healthy state.
Datanode: f4cdd5a5-94dd-4036-aca5-637406255b81 (/default-rack/172.18.0.6/ozone_datanode_1.ozone_default/2 pipelines)
Operational State: IN_SERVICE
Related pipelines:
3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Follower
cb7be734-0f39-48f1-b6cd-f3f099f16d20/ONE/RATIS/OPEN/Leader

Datanode: 3fd3011f-b739-4a97-885c-ab201f3bc055 (/default-rack/172.18.0.3/ozone_datanode_2.ozone_default/2 pipelines)
Operational State: IN_SERVICE
Related pipelines:
3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Leader
e0f3c8bc-9f1a-4732-8249-60532d456f7f/ONE/RATIS/OPEN/Leader

Datanode: 6f55a11f-14d8-4dc1-9a36-67d1dedd6985 (/default-rack/172.18.0.8/ozone_datanode_3.ozone_default/2 pipelines)
Operational State: IN_SERVICE
Related pipelines:
3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Follower
56a4c99e-74b7-4a0c-97ee-d9f46a507d87/ONE/RATIS/OPEN/Leader
{code}

Note: {{ozone admin datanode recommission}} does restore the DN to IN_SERVICE as if nothing has happened.

  was:
Tested in Docker once, if I run the {{ozone admin datanode decommission 172.18.0.5}} too early. The Datanode doesn't actually seem to be entering the DECOMMISSIONING state but the SCM registers the action. So far it has been over 10 minutes (fresh empty DN with docker-compose) and {{ozone admin datanode list}} still reports DECOMMISSIONING on that datanode I triggered decommissioning earlier.

{code}
bash-4.2$ ozone admin datanode list
Datanode: bf5e0f92-5012-4975-8018-c39bf50ef592 (/default-rack/172.18.0.4/ozone_datanode_4.ozone_default/0 pipelines)
Operational State: DECOMMISSIONING
Related pipelines:
No related pipelines or the node is not in Healthy state.
Datanode: f4cdd5a5-94dd-4036-aca5-637406255b81 (/default-rack/172.18.0.6/ozone_datanode_1.ozone_default/2 pipelines)
Operational State: IN_SERVICE
Related pipelines:
3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Follower
cb7be734-0f39-48f1-b6cd-f3f099f16d20/ONE/RATIS/OPEN/Leader

Datanode: 3fd3011f-b739-4a97-885c-ab201f3bc055 (/default-rack/172.18.0.3/ozone_datanode_2.ozone_default/2 pipelines)
Operational State: IN_SERVICE
Related pipelines:
3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Leader
e0f3c8bc-9f1a-4732-8249-60532d456f7f/ONE/RATIS/OPEN/Leader

Datanode: 6f55a11f-14d8-4dc1-9a36-67d1dedd6985 (/default-rack/172.18.0.8/ozone_datanode_3.ozone_default/2 pipelines)
Operational State: IN_SERVICE
Related pipelines:
3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Follower
56a4c99e-74b7-4a0c-97ee-d9f46a507d87/ONE/RATIS/OPEN/Leader
{code}


> SCM can incorrectly marks Datanode as DECOMMISSIONING when Datanode is not fully initialized
> --------------------------------------------------------------------------------------------
>
>                 Key: HDDS-4843
>                 URL: https://issues.apache.org/jira/browse/HDDS-4843
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Siyao Meng
>            Priority: Major
>
> Tested in Docker once, if I run the {{ozone admin datanode decommission 172.18.0.5}} too early. The Datanode doesn't actually seem to be entering the DECOMMISSIONING state but the SCM registers the action. So far it has been over 10 minutes (fresh empty DN with docker-compose) and {{ozone admin datanode list}} still reports DECOMMISSIONING on that datanode I triggered decommissioning earlier. Maybe the DN ignored or didn't receive the command in its early startup stage, while SCM is waiting indefinitely(not sure if there's a limit)?
> {code}
> bash-4.2$ ozone admin datanode list
> Datanode: bf5e0f92-5012-4975-8018-c39bf50ef592 (/default-rack/172.18.0.4/ozone_datanode_4.ozone_default/0 pipelines)
> Operational State: DECOMMISSIONING
> Related pipelines:
> No related pipelines or the node is not in Healthy state.
> Datanode: f4cdd5a5-94dd-4036-aca5-637406255b81 (/default-rack/172.18.0.6/ozone_datanode_1.ozone_default/2 pipelines)
> Operational State: IN_SERVICE
> Related pipelines:
> 3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Follower
> cb7be734-0f39-48f1-b6cd-f3f099f16d20/ONE/RATIS/OPEN/Leader
> Datanode: 3fd3011f-b739-4a97-885c-ab201f3bc055 (/default-rack/172.18.0.3/ozone_datanode_2.ozone_default/2 pipelines)
> Operational State: IN_SERVICE
> Related pipelines:
> 3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Leader
> e0f3c8bc-9f1a-4732-8249-60532d456f7f/ONE/RATIS/OPEN/Leader
> Datanode: 6f55a11f-14d8-4dc1-9a36-67d1dedd6985 (/default-rack/172.18.0.8/ozone_datanode_3.ozone_default/2 pipelines)
> Operational State: IN_SERVICE
> Related pipelines:
> 3a42112e-2178-423a-8fd2-0ceaf2b70d90/THREE/RATIS/OPEN/Follower
> 56a4c99e-74b7-4a0c-97ee-d9f46a507d87/ONE/RATIS/OPEN/Leader
> {code}
> Note: {{ozone admin datanode recommission}} does restore the DN to IN_SERVICE as if nothing has happened.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org