You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Istvan Fajth (Jira)" <ji...@apache.org> on 2019/12/09 16:00:15 UTC
[jira] [Created] (HDDS-2696) Document recovery from RATIS-677
Istvan Fajth created HDDS-2696:
----------------------------------
Summary: Document recovery from RATIS-677
Key: HDDS-2696
URL: https://issues.apache.org/jira/browse/HDDS-2696
Project: Hadoop Distributed Data Store
Issue Type: Improvement
Components: Ozone Datanode
Reporter: Istvan Fajth
As RATIS-677 is solved in a way where a setting needs to be changed, and set for the RatisServer implementation to ignore the corruption, and at the moment due to HDDS-2647, we do not have a clear recovery path from a ratis corruption in the pipeline data.
We should document how this can be recovered. I have an idea which includes closing the pipeline in SCM and remove the ratis metadata for the pipeline in the DataNodes, which effectively clears out the corrupted pipeline from the system.
There are two problems I have with finding a recovery path, and document it:
- I am not sure if we have strong enough guarantees that the writes happened properly if the ratis metadata could become corrupt so this needs to be investigated.
- At the moment I can not validate this approach, as if I do the steps (stop the 3 DN, move out ratis data for pipeline, close the pipeline with scmcli, then restart the DNs) the pipeline is not closed properly, and SCM fails as described in HDDS-2695
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org