You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2021/10/20 20:35:09 UTC

[jira] [Updated] (HDDS-2696) Document recovery from RATIS-677

     [ https://issues.apache.org/jira/browse/HDDS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Rose updated HDDS-2696:
-----------------------------
    Target Version/s: 1.3.0  (was: 1.2.0)

I am managing the 1.2.0 release and we currently have more than 600 issues targeted for 1.2.0. I am moving the target field to 1.3.0.

If you are actively working on this jira and believe this should be targeted for the 1.2.0 release, Please reach out to me via Apache email or Slack.

> Document recovery from RATIS-677
> --------------------------------
>
>                 Key: HDDS-2696
>                 URL: https://issues.apache.org/jira/browse/HDDS-2696
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode
>            Reporter: István Fajth
>            Priority: Critical
>              Labels: Triaged
>
> As RATIS-677 is solved in a way where a setting needs to be changed, and set for the RatisServer implementation to ignore the corruption, and at the moment due to HDDS-2647, we do not have a clear recovery path from a ratis corruption in the pipeline data.
> We should document how this can be recovered. I have an idea which includes closing the pipeline in SCM and remove the ratis metadata for the pipeline in the DataNodes, which effectively clears out the corrupted pipeline from the system.
> There are two problems I have with finding a recovery path, and document it:
> - I am not sure if we have strong enough guarantees that the writes happened properly if the ratis metadata could become corrupt so this needs to be investigated.
> - At the moment I can not validate this approach, as if I do the steps (stop the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli, then restart the DNs) the pipeline is not closed properly, and SCM fails as described in HDDS-2695



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org