You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Istvan Fajth (Jira)" <ji...@apache.org> on 2019/12/09 16:00:15 UTC

[jira] [Created] (HDDS-2696) Document recovery from RATIS-677

Istvan Fajth created HDDS-2696:
----------------------------------

             Summary: Document recovery from RATIS-677
                 Key: HDDS-2696
                 URL: https://issues.apache.org/jira/browse/HDDS-2696
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
          Components: Ozone Datanode
            Reporter: Istvan Fajth


As RATIS-677 is solved in a way where a setting needs to be changed, and set for the RatisServer implementation to ignore the corruption, and at the moment due to HDDS-2647, we do not have a clear recovery path from a ratis corruption in the pipeline data.

We should document how this can be recovered. I have an idea which includes closing the pipeline in SCM and remove the ratis metadata for the pipeline in the DataNodes, which effectively clears out the corrupted pipeline from the system.

There are two problems I have with finding a recovery path, and document it:
- I am not sure if we have strong enough guarantees that the writes happened properly if the ratis metadata could become corrupt so this needs to be investigated.
- At the moment I can not validate this approach, as if I do the steps (stop the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli, then restart the DNs) the pipeline is not closed properly, and SCM fails as described in HDDS-2695



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org