You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/05/17 10:03:24 UTC

[GitHub] [ozone] bharatviswa504 opened a new pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm start/init failed.

bharatviswa504 opened a new pull request #2249:
URL: https://github.com/apache/ozone/pull/2249


   …iled.
   
   ## What changes were proposed in this pull request?
   
   Initialize version file before and then perform Ratis Server initialize. In this way, we shall use the same groupID in failure scenarios.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5233
   ## How was this patch tested?
   
   Existing CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2249:
URL: https://github.com/apache/ozone/pull/2249#discussion_r634155011



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -84,20 +81,16 @@
   private DBCheckpoint installingDBCheckpoint = null;
 
   public SCMStateMachine(final StorageContainerManager scm,
-      final SCMRatisServer ratisServer, SCMHADBTransactionBuffer buffer)
-      throws SCMException {
+      SCMHADBTransactionBuffer buffer) {
     this.scm = scm;
     this.handlers = new EnumMap<>(RequestType.class);
     this.transactionBuffer = buffer;
     TransactionInfo latestTrxInfo = this.transactionBuffer.getLatestTrxInfo();
-    if (!latestTrxInfo.isDefault() &&
-        !updateLastAppliedTermIndex(latestTrxInfo.getTerm(),
-            latestTrxInfo.getTransactionIndex())) {
-      throw new SCMException(
-          String.format("Failed to update LastAppliedTermIndex " +
-                  "in StateMachine to term:{} index:{}",
-              latestTrxInfo.getTerm(), latestTrxInfo.getTransactionIndex()
-          ), SCM_NOT_INITIALIZED);
+    if (!latestTrxInfo.isDefault()) {

Review comment:
       @GlenGeng , can you please check this code change.I think its the right approach to get the last applied index from the transaction info table irrespective of what the current stateMachine applied index is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2249:
URL: https://github.com/apache/ozone/pull/2249#issuecomment-845635058


   Thank You @bshashikant and @GlenGeng for the review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2249:
URL: https://github.com/apache/ozone/pull/2249#discussion_r633485056



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -946,19 +946,32 @@ public static boolean scmInit(OzoneConfiguration conf,
           scmStorageConfig.setClusterId(clusterId);
         }
 
-        if (SCMHAUtils.isSCMHAEnabled(conf)) {
-          SCMRatisServerImpl.initialize(scmStorageConfig.getClusterID(),
-              scmStorageConfig.getScmId(), haDetails.getLocalNodeDetails(),
-              conf);
-          scmStorageConfig.setSCMHAFlag(true);
-        }
 
         if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
           HASecurityUtils.initializeSecurity(scmStorageConfig, conf,
               getScmAddress(haDetails, conf), true);
         }
+
+        // Do not move these code lines. If SCM version file is created,
+        // the subsequent scm init should use the groupID from version file.

Review comment:
       groupId-> clusterId

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -946,19 +946,32 @@ public static boolean scmInit(OzoneConfiguration conf,
           scmStorageConfig.setClusterId(clusterId);
         }
 
-        if (SCMHAUtils.isSCMHAEnabled(conf)) {
-          SCMRatisServerImpl.initialize(scmStorageConfig.getClusterID(),
-              scmStorageConfig.getScmId(), haDetails.getLocalNodeDetails(),
-              conf);
-          scmStorageConfig.setSCMHAFlag(true);
-        }
 
         if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
           HASecurityUtils.initializeSecurity(scmStorageConfig, conf,
               getScmAddress(haDetails, conf), true);
         }
+
+        // Do not move these code lines. If SCM version file is created,
+        // the subsequent scm init should use the groupID from version file.
+        // So, scmStorageConfig#initialize() should happen before ratis server
+        // initialize. In this way,we do not leave ratis storage directory
+        // with multiple raft Groups in failure scenario.
+

Review comment:
       multiple raft Groups -> multiple raft group directories

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -946,19 +946,32 @@ public static boolean scmInit(OzoneConfiguration conf,
           scmStorageConfig.setClusterId(clusterId);
         }
 
-        if (SCMHAUtils.isSCMHAEnabled(conf)) {
-          SCMRatisServerImpl.initialize(scmStorageConfig.getClusterID(),
-              scmStorageConfig.getScmId(), haDetails.getLocalNodeDetails(),
-              conf);
-          scmStorageConfig.setSCMHAFlag(true);
-        }
 
         if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
           HASecurityUtils.initializeSecurity(scmStorageConfig, conf,
               getScmAddress(haDetails, conf), true);
         }
+
+        // Do not move these code lines. If SCM version file is created,
+        // the subsequent scm init should use the groupID from version file.

Review comment:
       Do not move these code lines -> ensure scmRatisserver#initialize() is called post scm storage config initialization.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on a change in pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #2249:
URL: https://github.com/apache/ozone/pull/2249#discussion_r634875892



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -84,20 +81,16 @@
   private DBCheckpoint installingDBCheckpoint = null;
 
   public SCMStateMachine(final StorageContainerManager scm,
-      final SCMRatisServer ratisServer, SCMHADBTransactionBuffer buffer)
-      throws SCMException {
+      SCMHADBTransactionBuffer buffer) {
     this.scm = scm;
     this.handlers = new EnumMap<>(RequestType.class);
     this.transactionBuffer = buffer;
     TransactionInfo latestTrxInfo = this.transactionBuffer.getLatestTrxInfo();
-    if (!latestTrxInfo.isDefault() &&
-        !updateLastAppliedTermIndex(latestTrxInfo.getTerm(),
-            latestTrxInfo.getTransactionIndex())) {
-      throw new SCMException(
-          String.format("Failed to update LastAppliedTermIndex " +
-                  "in StateMachine to term:{} index:{}",
-              latestTrxInfo.getTerm(), latestTrxInfo.getTransactionIndex()
-          ), SCM_NOT_INITIALIZED);
+    if (!latestTrxInfo.isDefault()) {

Review comment:
       Overall I am fine with the change.
   Correctly me if I am wrong, the two are the same, except not checking the return code of `updateLastAppliedTermIndex()` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2249:
URL: https://github.com/apache/ozone/pull/2249#discussion_r634973783



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -84,20 +81,16 @@
   private DBCheckpoint installingDBCheckpoint = null;
 
   public SCMStateMachine(final StorageContainerManager scm,
-      final SCMRatisServer ratisServer, SCMHADBTransactionBuffer buffer)
-      throws SCMException {
+      SCMHADBTransactionBuffer buffer) {
     this.scm = scm;
     this.handlers = new EnumMap<>(RequestType.class);
     this.transactionBuffer = buffer;
     TransactionInfo latestTrxInfo = this.transactionBuffer.getLatestTrxInfo();
-    if (!latestTrxInfo.isDefault() &&
-        !updateLastAppliedTermIndex(latestTrxInfo.getTerm(),
-            latestTrxInfo.getTransactionIndex())) {
-      throw new SCMException(
-          String.format("Failed to update LastAppliedTermIndex " +
-                  "in StateMachine to term:{} index:{}",
-              latestTrxInfo.getTerm(), latestTrxInfo.getTransactionIndex()
-          ), SCM_NOT_INITIALIZED);
+    if (!latestTrxInfo.isDefault()) {

Review comment:
       Yes, previously updateLastAppliedTermIndex returns a false case we used to throw exception, which is not required from my understanding. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2249:
URL: https://github.com/apache/ozone/pull/2249#issuecomment-842953408


   > @bharatviswa504 , can you also fix the SCM stateMachine registry code as well to ensure SCMStateMachine instance is not shared across multiple raftServerImpl instances.
   
   Updated code to fix this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] GlenGeng commented on a change in pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
GlenGeng commented on a change in pull request #2249:
URL: https://github.com/apache/ozone/pull/2249#discussion_r634984100



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java
##########
@@ -84,20 +81,16 @@
   private DBCheckpoint installingDBCheckpoint = null;
 
   public SCMStateMachine(final StorageContainerManager scm,
-      final SCMRatisServer ratisServer, SCMHADBTransactionBuffer buffer)
-      throws SCMException {
+      SCMHADBTransactionBuffer buffer) {
     this.scm = scm;
     this.handlers = new EnumMap<>(RequestType.class);
     this.transactionBuffer = buffer;
     TransactionInfo latestTrxInfo = this.transactionBuffer.getLatestTrxInfo();
-    if (!latestTrxInfo.isDefault() &&
-        !updateLastAppliedTermIndex(latestTrxInfo.getTerm(),
-            latestTrxInfo.getTransactionIndex())) {
-      throw new SCMException(
-          String.format("Failed to update LastAppliedTermIndex " +
-                  "in StateMachine to term:{} index:{}",
-              latestTrxInfo.getTerm(), latestTrxInfo.getTransactionIndex()
-          ), SCM_NOT_INITIALIZED);
+    if (!latestTrxInfo.isDefault()) {

Review comment:
       agree. It should never fail during the startup of StateMachine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2249:
URL: https://github.com/apache/ozone/pull/2249#discussion_r633485056



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -946,19 +946,32 @@ public static boolean scmInit(OzoneConfiguration conf,
           scmStorageConfig.setClusterId(clusterId);
         }
 
-        if (SCMHAUtils.isSCMHAEnabled(conf)) {
-          SCMRatisServerImpl.initialize(scmStorageConfig.getClusterID(),
-              scmStorageConfig.getScmId(), haDetails.getLocalNodeDetails(),
-              conf);
-          scmStorageConfig.setSCMHAFlag(true);
-        }
 
         if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
           HASecurityUtils.initializeSecurity(scmStorageConfig, conf,
               getScmAddress(haDetails, conf), true);
         }
+
+        // Do not move these code lines. If SCM version file is created,
+        // the subsequent scm init should use the groupID from version file.

Review comment:
       groupId-> clusterId

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -946,19 +946,32 @@ public static boolean scmInit(OzoneConfiguration conf,
           scmStorageConfig.setClusterId(clusterId);
         }
 
-        if (SCMHAUtils.isSCMHAEnabled(conf)) {
-          SCMRatisServerImpl.initialize(scmStorageConfig.getClusterID(),
-              scmStorageConfig.getScmId(), haDetails.getLocalNodeDetails(),
-              conf);
-          scmStorageConfig.setSCMHAFlag(true);
-        }
 
         if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
           HASecurityUtils.initializeSecurity(scmStorageConfig, conf,
               getScmAddress(haDetails, conf), true);
         }
+
+        // Do not move these code lines. If SCM version file is created,
+        // the subsequent scm init should use the groupID from version file.
+        // So, scmStorageConfig#initialize() should happen before ratis server
+        // initialize. In this way,we do not leave ratis storage directory
+        // with multiple raft Groups in failure scenario.
+

Review comment:
       multiple raft Groups -> multiple raft group directories

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -946,19 +946,32 @@ public static boolean scmInit(OzoneConfiguration conf,
           scmStorageConfig.setClusterId(clusterId);
         }
 
-        if (SCMHAUtils.isSCMHAEnabled(conf)) {
-          SCMRatisServerImpl.initialize(scmStorageConfig.getClusterID(),
-              scmStorageConfig.getScmId(), haDetails.getLocalNodeDetails(),
-              conf);
-          scmStorageConfig.setSCMHAFlag(true);
-        }
 
         if (OzoneSecurityUtil.isSecurityEnabled(conf)) {
           HASecurityUtils.initializeSecurity(scmStorageConfig, conf,
               getScmAddress(haDetails, conf), true);
         }
+
+        // Do not move these code lines. If SCM version file is created,
+        // the subsequent scm init should use the groupID from version file.

Review comment:
       Do not move these code lines -> ensure scmRatisserver#initialize() is called post scm storage config initialization.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 merged pull request #2249: HDDS-5233. SCM subsequent init failed when previous scm init failed.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #2249:
URL: https://github.com/apache/ozone/pull/2249


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org