You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2021/06/24 03:44:00 UTC

[jira] [Comment Edited] (HDDS-5338) Handle Bootstrap when original OM has non-ratis transactions

    [ https://issues.apache.org/jira/browse/HDDS-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368592#comment-17368592 ] 

Bharat Viswanadham edited comment on HDDS-5338 at 6/24/21, 3:43 AM:
--------------------------------------------------------------------

We need to download checkpoint when converting from non-ha ratis based cluster to ha enabled cluster like when we add 2 more nodes to make it HA (in this case, the old single node OM is first converted to ratis-enabled, and then if we add 2 more nodes,  older OM only can become leader right?, so we can download the checkpoint from that.  Let me know if i am missing something here.

{quote}Let's say there are 3 existing OMs - om1, om2 and om3. om1 is network partitioned from the other 2 and assumes itself to be the leader. We try to bootstrap a new OM om4 and it contacts om1 first and downloads a checkpoint from it (since om1 replies that it is the leader). But since om1 was network partitioned, it does not have the correct DB snapshot. After this, om4 contacts the OM ring again to do a SetConfiguration. This request now goes to the correct leader OM - om2. om2 assumes that the bootstrapping OM has already got the non-ratis transactions through the DB checkpoint and sends it only the ratis logs. This will lead to inconsistent state in om4.{quote} 

If it has more than 1 node, that means it is already ratis enabled cluster, why do we need to download checkpoint at all in this scenario?

Now the question will be how to distinguish when to download, so if we can pass a flag to bootstrap node that it is being converted from non-ha to ha then only download snapshot. (This is just one of the way to solve this)


was (Author: bharatviswa):
We need to download checkpoint when converting from non-ha ratis based cluster to ha enabled cluster like when we add 2 more nodes to make it HA (in this case, the old single node OM is first converted to ratis-enabled, and then if we add 2 more nodes, only the older one can become leader, so we can download the checkpoint from that.  

{quote}Let's say there are 3 existing OMs - om1, om2 and om3. om1 is network partitioned from the other 2 and assumes itself to be the leader. We try to bootstrap a new OM om4 and it contacts om1 first and downloads a checkpoint from it (since om1 replies that it is the leader). But since om1 was network partitioned, it does not have the correct DB snapshot. After this, om4 contacts the OM ring again to do a SetConfiguration. This request now goes to the correct leader OM - om2. om2 assumes that the bootstrapping OM has already got the non-ratis transactions through the DB checkpoint and sends it only the ratis logs. This will lead to inconsistent state in om4.{quote} 

If it has more than 1 node, that means it is already ratis enabled cluster, why do we need to download checkpoint at all in this scenario?

> Handle Bootstrap when original OM has non-ratis transactions
> ------------------------------------------------------------
>
>                 Key: HDDS-5338
>                 URL: https://issues.apache.org/jira/browse/HDDS-5338
>             Project: Apache Ozone
>          Issue Type: Sub-task
>    Affects Versions: 1.2.0
>            Reporter: Hanisha Koneru
>            Assignee: Hanisha Koneru
>            Priority: Major
>
> When non-Ratis OM is converted to ratis enabled OM, there could be transactions in the RocksDB which are not part of the Ratis logs. If the Ratis logs are not purged when a new OM is bootstrapped, it will just get all the Ratis logs from the old OM. The non-ratis transactions in the RocksDB will not be transferred to the new OM as Ratis will not know that there are transactions in the DB not present in the logs. 
> So when a new OM is bootstrapping, we should check the DB for non-ratis transactions and if any are present, the new OM should download the DB from existing OM before the setConf request is sent out.
> Thanks [~bharat] for identifying this scenario [here|https://github.com/apache/ozone/pull/1494#issuecomment-859329558] .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org