You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/05/17 20:34:40 UTC

[GitHub] [ozone] errose28 opened a new pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

errose28 opened a new pull request #2257:
URL: https://github.com/apache/ozone/pull/2257


   ## What changes were proposed in this pull request?
   
   SCM HA has been merged in to the master branch before the upgrade framework. With the current implementation of SCM HA pre-finalize validation (which prevents the new feature from being used until the cluster is finalized) users who follow the Apache releases and those who incrementally pull from master will have different experiences when they get the upgrade framework:
   
   For users following official Ozone releases, the upgrade framework and SCM HA will land in the same release. SCM HA should not be allowed in pre-finalize so that these users can downgrade using the upgrade framework. This is currently how the validation action works.
   
   For users who pull from master and are already using SCM HA, the current pre-finalize validation will fail to start the SCM until they turn off SCM HA. The cluster must be finalized, and then SCM HA can be re-enabled. This may be surprising and inconvenient.
   
   To support both use cases, the SCM HA pre-finalize validation action should not fail if SCM HA was already being used before the upgrade. To check for this, we will check if there are ratis log entries when the action runs. SCM validation actions have been changed to run before the ratis server starts, so if there is a valid committed log index when the action runs, it must be from when SCM HA was used before an upgrade.
   
   ## What is the link to the Apache JIRA
   
   HDDS-5226
   
   ## How was this patch tested?
   
   Due to more complex behavior, the existing unit test for the old action was replaced with an integration test. Since this feature will not be changing frequently, the test is ignored to save CI time, but can be run locally to verify this action if it needs to be modified.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r634860109



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       Took a closer look and init actually cannot happen in a pre-finalized cluster. SCM must already be inited for the cluster to be pre-finalized, and running init again will cause an error in `Storage#setClusterId`. So we shouldn't have to worry about ratis server being started in init for pre-finalized clusters. In new clusters it is not a problem since there is nowhere to downgrade to.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r635098004



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       Yes, but if init happens again after we upgrade bits with ha enable flag (after upgrade), then this will not catch the issue right? As on an upgraded cluster, init can happen, check else block of scmInit.
   Like in some deployments where init is run with every restart.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r638015520



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       Just a question why do we need to delay this check until the start? Why can't it be done in the StorageManager constructor for fail fast?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] avijayanhwx merged pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
avijayanhwx merged pull request #2257:
URL: https://github.com/apache/ozone/pull/2257


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r636211575



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       Thanks for the offline discussion @bharatviswa504. I now see that we can run init again after the upgrade, even though it is not required, and must run the check there as well to prevent Ratis setup steps from happening. The PR has been updated accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r635098004



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       Yes, but if init happens again after we upgrade bits with ha enable flag on, then this will not catch the issue right? As on an upgraded cluster, init can happen, check else block of scmInit.
   Like in some deployments where init is run with every restart.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r636945091



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -962,6 +963,11 @@ public static boolean scmInit(OzoneConfiguration conf,
         return false;
       }
     } else {
+      // If SCM HA was not being used before pre-finalize, and is being used
+      // when the cluster is pre-finalized for the SCM HA feature, init
+      // should fail.
+      ScmHAUnfinalizedStateValidationAction.checkScmHA(conf, scmStorageConfig);
+
       clusterId = scmStorageConfig.getClusterID();
       final boolean isSCMHAEnabled = scmStorageConfig.isSCMHAEnabled();
       if (SCMHAUtils.isSCMHAEnabled(conf) && !isSCMHAEnabled) {

Review comment:
       Good catch, thanks @guihecheng . The problem here is that the upgrade framework will only run upgrade actions in pre-finalized cluster, but because I kind of had to side-step the upgrade framework's action runner here to execute this statically, it is actually being run in finalized clusters too. This can be seen in the latest integration test failure: https://github.com/apache/ozone/pull/2257/checks?check_run_id=2631515596. I will add a pre-finalization check to this call, which will differentiate it from the if-else check here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#issuecomment-843379025


   Thanks for the review and suggestions @bharatviswa504 I will incorporate them and push the changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r635094006



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/ScmHAUnfinalizedStateValidationAction.java
##########
@@ -23,23 +23,25 @@
 import static org.apache.hadoop.ozone.upgrade.UpgradeActionHdds.Component.SCM;
 
 import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.ha.SCMHAUtils;
 import org.apache.hadoop.hdds.scm.server.StorageContainerManager;
 import org.apache.hadoop.hdds.upgrade.HDDSUpgradeAction;
 import org.apache.hadoop.ozone.upgrade.UpgradeActionHdds;
 import org.apache.hadoop.ozone.upgrade.UpgradeException;
 
+/**
+ * Checks that SCM HA cannot be used in a pre-finalized cluster, unless it
+ * was already being used before this action was run.
+ */
 @UpgradeActionHdds(feature = SCM_HA, component = SCM,
     type = VALIDATE_IN_PREFINALIZE)
 public class ScmHAUnfinalizedStateValidationAction
     implements HDDSUpgradeAction<StorageContainerManager> {
 
   @Override
   public void execute(StorageContainerManager scm) throws Exception {
-    boolean isHAEnabled =
-        scm.getConfiguration().getBoolean(ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY,
-        ScmConfigKeys.OZONE_SCM_HA_ENABLE_DEFAULT);
-
-    if (isHAEnabled) {
+    if (SCMHAUtils.isSCMHAEnabled(scm.getConfiguration()) &&
+        !scm.getScmStorageConfig().isSCMHAEnabled()) {

Review comment:
       why do we need both the  checks one from configuration and one from version file?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r634023911



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/ScmHAUnfinalizedStateValidationAction.java
##########
@@ -37,13 +45,22 @@
   public void execute(StorageContainerManager scm) throws Exception {
     boolean isHAEnabled =
         scm.getConfiguration().getBoolean(ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY,
-        ScmConfigKeys.OZONE_SCM_HA_ENABLE_DEFAULT);
+            ScmConfigKeys.OZONE_SCM_HA_ENABLE_DEFAULT);
 
     if (isHAEnabled) {
-      throw new UpgradeException(String.format("Configuration %s cannot be " +
-          "used until SCM upgrade has been finalized",
-          ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY),
-          UpgradeException.ResultCodes.PREFINALIZE_ACTION_VALIDATION_FAILED);
+      long lastIndex = scm.getScmHAManager()

Review comment:
       I believe we don't need this, we can check SCM StorageConfig SCM_HA Property.
   For Ha enabled cluster, this property is set.
   

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/upgrade/TestScmHAUnfinalizedStateValidationAction.java
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.upgrade;
+
+import static org.apache.hadoop.hdds.upgrade.HDDSLayoutFeature.INITIAL_VERSION;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.server.StorageContainerManager;
+import org.apache.hadoop.hdds.scm.server.upgrade.ScmHAUnfinalizedStateValidationAction;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.MiniOzoneClusterImpl;
+import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl;
+import org.apache.hadoop.ozone.upgrade.UpgradeException;
+import org.apache.hadoop.test.LambdaTestUtils;
+import org.junit.Ignore;
+import org.junit.Test;
+
+/**
+ * Tests that the SCM HA pre-finalize validation action is only triggered in
+ * pre-finalize startup if SCM HA was not already being used in the cluster,
+ * but has been turned on after.
+ *
+ * Starting a new SCM HA cluster finalized should not trigger the action. This
+ * is tested by all other tests that use SCM HA from the latest version of the
+ * code.
+ *
+ * Starting a new cluster finalized without SCM HA enabled should not trigger
+ * the action. This is tested by all other tests that run non-HA clusters.
+ */
+// Tests are ignored to speed up CI runs. Run manually if changes are made
+// relating to the SCM HA validation action.
+@Ignore
+public class TestScmHAUnfinalizedStateValidationAction {
+  private static final OzoneConfiguration CONF = new OzoneConfiguration();
+
+  @Test
+  public void testHAEnabledAlreadyUsedFinalized() throws Exception {
+    // Verification should pass.
+    MiniOzoneCluster cluster = new MiniOzoneHAClusterImpl.Builder(CONF)
+        .setNumDatanodes(1)
+        .setNumOfStorageContainerManagers(3)
+        .setNumOfOzoneManagers(1)
+        .setSCMServiceId("id1")
+        .setOMServiceId("id2")
+        .build();
+
+    // Manually trigger the upgrade action after setup, to see if it detects
+    // that HA is being used based on disk structures.
+    // This avoids saving disk state between mini ozone cluster restart.
+    ScmHAUnfinalizedStateValidationAction action =
+        new ScmHAUnfinalizedStateValidationAction();
+    for (StorageContainerManager scm:
+        ((MiniOzoneHAClusterImpl) cluster).getStorageContainerManagers()) {
+      action.execute(scm);
+    }
+
+    cluster.shutdown();
+  }
+
+  @Test
+  public void testHAEnabledNotAlreadyUsedPreFinalized() throws Exception {
+    // Verification should fail.
+    MiniOzoneCluster.Builder builder = new MiniOzoneHAClusterImpl.Builder(CONF)
+        .setNumDatanodes(1)
+        .setNumOfStorageContainerManagers(3)
+        .setNumOfOzoneManagers(1)
+        .setSCMServiceId("id1")
+        .setOMServiceId("id2")
+        .setScmLayoutVersion(INITIAL_VERSION.layoutVersion())

Review comment:
       Question:
   So init will successfully happen but startup will fail?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       Question:
   So init will successfully happen but the startup will fail?
   
   
   Because if init happens every time after the upgrade, ratis sever might have started and already initialized ratis server, and wait for leader election.
   
   Do we want to run these checks in init also to avoid ratis server initialization.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r635105199



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       >>Took a closer look and init actually cannot happen in a pre-finalized cluster.
   What is meant by this?
   Let's say freshly installed cluster, can we perform init or not?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       >>Took a closer look and init actually cannot happen in a pre-finalized cluster.
   
   What is meant by this?
   
   Let's say freshly installed cluster, can we perform init or not?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r636945091



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -962,6 +963,11 @@ public static boolean scmInit(OzoneConfiguration conf,
         return false;
       }
     } else {
+      // If SCM HA was not being used before pre-finalize, and is being used
+      // when the cluster is pre-finalized for the SCM HA feature, init
+      // should fail.
+      ScmHAUnfinalizedStateValidationAction.checkScmHA(conf, scmStorageConfig);
+
       clusterId = scmStorageConfig.getClusterID();
       final boolean isSCMHAEnabled = scmStorageConfig.isSCMHAEnabled();
       if (SCMHAUtils.isSCMHAEnabled(conf) && !isSCMHAEnabled) {

Review comment:
       Good catch, thanks @guihecheng . The problem here is that the upgrade framework will only run upgrade actions in pre-finalized cluster, but but because I kind of had to side-step the upgrade framework's action runner here to execute this statically, it is actually being run in finalized clusters too. This can be seen in the latest integration test failure: https://github.com/apache/ozone/pull/2257/checks?check_run_id=2631515596. I will add a pre-finalization check to this call, which will differentiate it from the if-else check here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r634860493



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/upgrade/TestScmHAUnfinalizedStateValidationAction.java
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.upgrade;
+
+import static org.apache.hadoop.hdds.upgrade.HDDSLayoutFeature.INITIAL_VERSION;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.server.StorageContainerManager;
+import org.apache.hadoop.hdds.scm.server.upgrade.ScmHAUnfinalizedStateValidationAction;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.MiniOzoneClusterImpl;
+import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl;
+import org.apache.hadoop.ozone.upgrade.UpgradeException;
+import org.apache.hadoop.test.LambdaTestUtils;
+import org.junit.Ignore;
+import org.junit.Test;
+
+/**
+ * Tests that the SCM HA pre-finalize validation action is only triggered in
+ * pre-finalize startup if SCM HA was not already being used in the cluster,
+ * but has been turned on after.
+ *
+ * Starting a new SCM HA cluster finalized should not trigger the action. This
+ * is tested by all other tests that use SCM HA from the latest version of the
+ * code.
+ *
+ * Starting a new cluster finalized without SCM HA enabled should not trigger
+ * the action. This is tested by all other tests that run non-HA clusters.
+ */
+// Tests are ignored to speed up CI runs. Run manually if changes are made
+// relating to the SCM HA validation action.
+@Ignore
+public class TestScmHAUnfinalizedStateValidationAction {
+  private static final OzoneConfiguration CONF = new OzoneConfiguration();
+
+  @Test
+  public void testHAEnabledAlreadyUsedFinalized() throws Exception {
+    // Verification should pass.
+    MiniOzoneCluster cluster = new MiniOzoneHAClusterImpl.Builder(CONF)
+        .setNumDatanodes(1)
+        .setNumOfStorageContainerManagers(3)
+        .setNumOfOzoneManagers(1)
+        .setSCMServiceId("id1")
+        .setOMServiceId("id2")
+        .build();
+
+    // Manually trigger the upgrade action after setup, to see if it detects
+    // that HA is being used based on disk structures.
+    // This avoids saving disk state between mini ozone cluster restart.
+    ScmHAUnfinalizedStateValidationAction action =
+        new ScmHAUnfinalizedStateValidationAction();
+    for (StorageContainerManager scm:
+        ((MiniOzoneHAClusterImpl) cluster).getStorageContainerManagers()) {
+      action.execute(scm);
+    }
+
+    cluster.shutdown();
+  }
+
+  @Test
+  public void testHAEnabledNotAlreadyUsedPreFinalized() throws Exception {
+    // Verification should fail.
+    MiniOzoneCluster.Builder builder = new MiniOzoneHAClusterImpl.Builder(CONF)
+        .setNumDatanodes(1)
+        .setNumOfStorageContainerManagers(3)
+        .setNumOfOzoneManagers(1)
+        .setSCMServiceId("id1")
+        .setOMServiceId("id2")
+        .setScmLayoutVersion(INITIAL_VERSION.layoutVersion())

Review comment:
       Actually I think we are ok for SCM init. See above and let me know if you think that checks out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] guihecheng commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r636606128



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -962,6 +963,11 @@ public static boolean scmInit(OzoneConfiguration conf,
         return false;
       }
     } else {
+      // If SCM HA was not being used before pre-finalize, and is being used
+      // when the cluster is pre-finalized for the SCM HA feature, init
+      // should fail.
+      ScmHAUnfinalizedStateValidationAction.checkScmHA(conf, scmStorageConfig);
+
       clusterId = scmStorageConfig.getClusterID();
       final boolean isSCMHAEnabled = scmStorageConfig.isSCMHAEnabled();
       if (SCMHAUtils.isSCMHAEnabled(conf) && !isSCMHAEnabled) {

Review comment:
       Hi @errose28 , I noticed that this check is just the same as `checkScmHA()`, so this block of code is somewhat 'dead' after `checkScmHA()` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r636221509



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/ScmHAUnfinalizedStateValidationAction.java
##########
@@ -23,23 +23,25 @@
 import static org.apache.hadoop.ozone.upgrade.UpgradeActionHdds.Component.SCM;
 
 import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.ha.SCMHAUtils;
 import org.apache.hadoop.hdds.scm.server.StorageContainerManager;
 import org.apache.hadoop.hdds.upgrade.HDDSUpgradeAction;
 import org.apache.hadoop.ozone.upgrade.UpgradeActionHdds;
 import org.apache.hadoop.ozone.upgrade.UpgradeException;
 
+/**
+ * Checks that SCM HA cannot be used in a pre-finalized cluster, unless it
+ * was already being used before this action was run.
+ */
 @UpgradeActionHdds(feature = SCM_HA, component = SCM,
     type = VALIDATE_IN_PREFINALIZE)
 public class ScmHAUnfinalizedStateValidationAction
     implements HDDSUpgradeAction<StorageContainerManager> {
 
   @Override
   public void execute(StorageContainerManager scm) throws Exception {
-    boolean isHAEnabled =
-        scm.getConfiguration().getBoolean(ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY,
-        ScmConfigKeys.OZONE_SCM_HA_ENABLE_DEFAULT);
-
-    if (isHAEnabled) {
+    if (SCMHAUtils.isSCMHAEnabled(scm.getConfiguration()) &&
+        !scm.getScmStorageConfig().isSCMHAEnabled()) {

Review comment:
       The version file check indicates whether SCM HA was already in use before the upgrade framework was deployed to this cluster. The config check indicates whether SCM HA is currently enabled. The action is run when the cluster is pre-finalized for SCM HA (users should still be able to downgrade at this point). So if we are pre-finalized for SCM HA, HA has been enabled in the config, and HA was not being used already, we should fail startup so that SCM HA is not used until the cluster is finalized.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r638057581



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       This specific check could probably be moved to the constructor, but in general, the upgrade framework's action execution gives a fully constructed service to each action to inspect for conditions which may violate pre-finalize state. See `UpgradeFinalizer#runPrefinalizeStateActions` and its implementations. With the current upgrade action implementation we had to work around this a bit to run this action statically in init, but circumventing the upgrade framework and manually checking for pre-finalized state in every action is not recommended. In the future we may introduce a new type of pre-finalize validation action that can be run statically if these sort of init checks become common.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r634601649



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1158,6 +1158,8 @@ public String getDatanodeRpcPort() {
    */
   @Override
   public void start() throws IOException {
+    upgradeFinalizer.runPrefinalizeStateActions(scmStorageConfig, this);

Review comment:
       In the current implementation init will succeed but start will fail. Init does not need to be run in the new version before the new bits are started, but it shouldn't hurt either. I did not realize we were also starting a ratis server in SCM init, so we should probably run the upgrade actions there too, yes. This shouldn't affect new SCMs since they will be finalized and the actions will not run.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/upgrade/ScmHAUnfinalizedStateValidationAction.java
##########
@@ -37,13 +45,22 @@
   public void execute(StorageContainerManager scm) throws Exception {
     boolean isHAEnabled =
         scm.getConfiguration().getBoolean(ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY,
-        ScmConfigKeys.OZONE_SCM_HA_ENABLE_DEFAULT);
+            ScmConfigKeys.OZONE_SCM_HA_ENABLE_DEFAULT);
 
     if (isHAEnabled) {
-      throw new UpgradeException(String.format("Configuration %s cannot be " +
-          "used until SCM upgrade has been finalized",
-          ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY),
-          UpgradeException.ResultCodes.PREFINALIZE_ACTION_VALIDATION_FAILED);
+      long lastIndex = scm.getScmHAManager()

Review comment:
       Oh I did not realize we were already saving a marker that SCM HA was in use to the disk. That is much better I will use that.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/upgrade/TestScmHAUnfinalizedStateValidationAction.java
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ *
+ */
+
+package org.apache.hadoop.hdds.upgrade;
+
+import static org.apache.hadoop.hdds.upgrade.HDDSLayoutFeature.INITIAL_VERSION;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.scm.server.StorageContainerManager;
+import org.apache.hadoop.hdds.scm.server.upgrade.ScmHAUnfinalizedStateValidationAction;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.MiniOzoneClusterImpl;
+import org.apache.hadoop.ozone.MiniOzoneHAClusterImpl;
+import org.apache.hadoop.ozone.upgrade.UpgradeException;
+import org.apache.hadoop.test.LambdaTestUtils;
+import org.junit.Ignore;
+import org.junit.Test;
+
+/**
+ * Tests that the SCM HA pre-finalize validation action is only triggered in
+ * pre-finalize startup if SCM HA was not already being used in the cluster,
+ * but has been turned on after.
+ *
+ * Starting a new SCM HA cluster finalized should not trigger the action. This
+ * is tested by all other tests that use SCM HA from the latest version of the
+ * code.
+ *
+ * Starting a new cluster finalized without SCM HA enabled should not trigger
+ * the action. This is tested by all other tests that run non-HA clusters.
+ */
+// Tests are ignored to speed up CI runs. Run manually if changes are made
+// relating to the SCM HA validation action.
+@Ignore
+public class TestScmHAUnfinalizedStateValidationAction {
+  private static final OzoneConfiguration CONF = new OzoneConfiguration();
+
+  @Test
+  public void testHAEnabledAlreadyUsedFinalized() throws Exception {
+    // Verification should pass.
+    MiniOzoneCluster cluster = new MiniOzoneHAClusterImpl.Builder(CONF)
+        .setNumDatanodes(1)
+        .setNumOfStorageContainerManagers(3)
+        .setNumOfOzoneManagers(1)
+        .setSCMServiceId("id1")
+        .setOMServiceId("id2")
+        .build();
+
+    // Manually trigger the upgrade action after setup, to see if it detects
+    // that HA is being used based on disk structures.
+    // This avoids saving disk state between mini ozone cluster restart.
+    ScmHAUnfinalizedStateValidationAction action =
+        new ScmHAUnfinalizedStateValidationAction();
+    for (StorageContainerManager scm:
+        ((MiniOzoneHAClusterImpl) cluster).getStorageContainerManagers()) {
+      action.execute(scm);
+    }
+
+    cluster.shutdown();
+  }
+
+  @Test
+  public void testHAEnabledNotAlreadyUsedPreFinalized() throws Exception {
+    // Verification should fail.
+    MiniOzoneCluster.Builder builder = new MiniOzoneHAClusterImpl.Builder(CONF)
+        .setNumDatanodes(1)
+        .setNumOfStorageContainerManagers(3)
+        .setNumOfOzoneManagers(1)
+        .setSCMServiceId("id1")
+        .setOMServiceId("id2")
+        .setScmLayoutVersion(INITIAL_VERSION.layoutVersion())

Review comment:
       With the current implementation yes, but after your comment above I will modify it so that init will also fail if it is run when the cluster is pre-finalized for SCM HA.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a change in pull request #2257: HDDS-5226. Do not fail SCM HA pre-finalize validation if SCM HA was already being used.

Posted by GitBox <gi...@apache.org>.
errose28 commented on a change in pull request #2257:
URL: https://github.com/apache/ozone/pull/2257#discussion_r636945091



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -962,6 +963,11 @@ public static boolean scmInit(OzoneConfiguration conf,
         return false;
       }
     } else {
+      // If SCM HA was not being used before pre-finalize, and is being used
+      // when the cluster is pre-finalized for the SCM HA feature, init
+      // should fail.
+      ScmHAUnfinalizedStateValidationAction.checkScmHA(conf, scmStorageConfig);
+
       clusterId = scmStorageConfig.getClusterID();
       final boolean isSCMHAEnabled = scmStorageConfig.isSCMHAEnabled();
       if (SCMHAUtils.isSCMHAEnabled(conf) && !isSCMHAEnabled) {

Review comment:
       If SCM HA is used pre-finalized and it was not already being used, this action will fail with an exception and the code after the `checkScmHA` call will not run because the whole SCM will exit. However, if SCM HA is used pre-finalized when it was being used before the upgrade framework, or the cluster is not pre-finalized, the action will not throw an exception, and the code below the call will run. So it is not totally dead. This is the same behavior added to the SCM start method. Raising the exception on failure is preferable to an if-else check because the `UpgradeException` with `PREFINALIZE_ACTION_VALIDATION_FAILED` result code and message allows us to give more information to the user, rather than returning `false` from init.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org