You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2021/11/11 05:17:00 UTC

[jira] [Created] (HDDS-5973) [SCM-HA] Sequence of steps during pipeline close need to be changed

Bharat Viswanadham created HDDS-5973:
----------------------------------------

             Summary: [SCM-HA] Sequence of steps during pipeline close need to be changed
                 Key: HDDS-5973
                 URL: https://issues.apache.org/jira/browse/HDDS-5973
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Bharat Viswanadham
            Assignee: Aswin Shakil Balasubramanian


Right now, when datanode becomes stale/dead
we close pipeline and then close containers.
This has caused issue that containers are in open state and pipeline is in close state. When adding a open container to closed pipeline SCM used to crash. As in SCM HA, flush to DB happens at snapshot frequency interval. In this case pipeline close is flushed to DB. And after this there are 2 ways it can happen
1. Close containers in ratis log.
2. SCM stopped, close container has not entered ratis log.

First case, once logs replayed Containers will be closed. (This is fixed as part of HDDS-5843, considering logs will be replayed and SCM state will be reached eventually to correct state)
In 2nd case container will be left open, as pipeline is in closed state. (Container might be forever in open state, and if under-replicated might not be replicated by RM, as container state is not in closed state)

Ordering should be changed as below during pipeline close
1. Close containers
2. Close pipelines



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org