You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/03/23 08:29:20 UTC

[GitHub] [ozone] GlenGeng opened a new pull request #2072: HDDS-5015. localId is not consistent across SCMs when setup a multi node SCM HA cluster.

GlenGeng opened a new pull request #2072:
URL: https://github.com/apache/ozone/pull/2072


   ## What changes were proposed in this pull request?
   
   We set up the three node SCM HA cluster for test purpose.
   From ozone dbug ldb tool, we found that the localIDs are not same between the three SCM. The reason is due to localID, which is initialized based on each machines own timestamp. 
   
   **The root cause here is:**
   when localId is not set in the sequenceId table, SCM will initialize it to be UniqueId.next(). When setup 3 SCM from scratch, each of them will individually set their localId to be their own UniqueId.next(), thus the sequenceId is diverged from the very beginning.
    
   **Short term solution is:**
   make the 3 empty SCM achieve an agreement about the localId.
    
   **Long tem solutos is:**
   Check HDDS-5016.
   During bootstrap, the new SCM always downloads checkpoint from leader SCM, and replace their own scm.db with that of leader.
    
   **The short term solution is safe:**
   upgrade in-memory scm to bypass-ratis scm: not affected.
   upgrade in-memory scm to single-node scm: not affected.
   upgrade in-memory scm to three-node scm cluster: not support yet, waits for long-term solution.
   setup a bypass-ratis scm: not affected.
   setup a three-node scm cluster from scratch: fix by the short term solution.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5015
   
   ## How was this patch tested?
   
   CI and real env test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on pull request #2072: HDDS-5015. localId is not consistent across SCMs when setup a multi node SCM HA cluster.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on pull request #2072:
URL: https://github.com/apache/ozone/pull/2072#issuecomment-805462781


   Thanks @bshashikant for the review! I will merge this short term solution. The long term solution has been tracked by a new HDDS-5016.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng merged pull request #2072: HDDS-5015. localId is not consistent across SCMs when setup a multi node SCM HA cluster.

Posted by GitBox <gi...@apache.org>.
GlenGeng merged pull request #2072:
URL: https://github.com/apache/ozone/pull/2072


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org