You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/03 22:50:24 UTC

[GitHub] [ozone] hanishakoneru opened a new pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

hanishakoneru opened a new pull request #2491:
URL: https://github.com/apache/ozone/pull/2491


   ## What changes were proposed in this pull request?
   
   When an OM is bootstrapped, it sends a SetConfiguration ratis request to leader OM and this request is propagated to all existing OMs in the ring. Existing OMs add the new OM to their peer list by reloading the config files and getting the new node information from the reloaded config files. But if the config file (ozone-site.xml) has not been updated on any OM, then that OM would crash when processing the SetConfiguration request. 
   
   To avoid this scenario, before a bootstrap request is sent to the leader OM, the bootstrapping OM must verify that all the existing OMs have its information in their ozone-site.xml. To achieve this, we have added a new OMMetadata protocol. This protocol would be called during OM decommissioning also to verify that ozone-site.xmls are updated.
   
   It could happen that an OM is down or not responding to a bootstrapping OMs metadata information call. In this scenario, to proceed with bootstrap, a "force" option is provided. During force bootstrap, the check for config update on existing OMs will be skipped.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5534
   
   ## How was this patch tested?
   
   Will add unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r749704225



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
##########
@@ -100,7 +100,9 @@ private void addPropertiesNotInXml() {
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_INTERVAL_DELAY,
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_FLUSH_PARAM,
         OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY,
-        OMConfigKeys.OZONE_OM_HA_PREFIX
+        OMConfigKeys.OZONE_OM_HA_PREFIX,
+        OMConfigKeys.OZONE_OM_ADMIN_PROTOCOL_MAX_RETRIES_KEY,

Review comment:
       Because atleast internally I have seen a bug for admin commands, where using defaults it is taking longer, the users needs some settings to control retry behavior. (This is for SCM, and in SCM admin commands are part of normal client/block protocol, as in OM we have separated it out that is good. They can tune these settings according to CU needs)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-971874418


   Thanks @bharatviswa504. I have addressed the comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 edited a comment on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-906236680


   Thank You @hanishakoneru for this improvement.
   Left few comments, over all idea seems good to me.
   
   Need some tests to test positive and negative scenarios of this config updation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r719212939



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1373,6 +1411,64 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMMetadata remoteOMMetadata = omMetadataProtocolClient.getOMMetadata();
+        boolean exists = checkOMexistsInRemoteOMConfig(remoteOMMetadata);
+        if (!exists) {
+          LOG.error("Remote OM " + remoteNodeId + ":" +
+              remoteNodeDetails.getHostAddress() + " does not have the " +
+              "bootstrapping OM(" + getOMNodeId() + ") information on " +
+              "reloading configs.");
+          omsWihtoutNewConfig.add(remoteNodeDetails);
+        }
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check before bootstrap failed on OM {}",
+            remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      StringBuilder errorMsgBuilder = new StringBuilder();
+      errorMsgBuilder.append("OM(s) [")
+          .append(omsWihtoutNewConfig.get(0).getOMPrintInfo());
+      for (int i = 1; i < omsWihtoutNewConfig.size(); i++) {
+        errorMsgBuilder.append(",")
+            .append(omsWihtoutNewConfig.get(i).getOMPrintInfo());
+      }
+      errorMsgBuilder.append("] do not have the bootstrapping OM information." +
+          " Update their ozone-site.xml with new node details before " +
+          "proceeding.");
+      shutdown(errorMsgBuilder.toString());
+    }
+  }
+
+  /**
+   * Check whether current OM information exists in the remote OM's reloaded
+   * configs.
+   */
+  private boolean checkOMexistsInRemoteOMConfig(OMMetadata remoteOMMetadata) {
+    List<OMNodeDetails> omNodesInNewConf = remoteOMMetadata
+        .getOmNodesInNewConf();
+    for (OMNodeDetails omNodeInRemoteOM : omNodesInNewConf) {

Review comment:
       I tested on a cluster and verified that bootstrapping a node again would lead it to crash. And the exception comes from Ratis. Instead, we should keep this check here to avoid that.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-904179249


   Thank you @bharatviswa504 for reviewing the PR. I have addressed some of your comments and replied to the rest.
   
   > One question what advantage of this validation compared to failing with exception, will we be in some inconsistent state? I just want to understand reason behind going this route. Because if any of the node is down, we cannot make use of this check i believe.
   
   If the config has not been updated on an OM and the SetConfiguration request is executed, then it could lead to that OM crashing out. If we perform this check before, we can avoid a healthy OM going down because of not updating its config with the new OM info. A healthy running OM should not crash when trying to add another OM.
   
   If a node is down, this check would at the very least warn the user to double check the configs before proceeding with force bootstapping.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-952059694


   @bharatviswa504, can you please take a look when you get a chance. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r749703157



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
##########
@@ -100,7 +100,9 @@ private void addPropertiesNotInXml() {
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_INTERVAL_DELAY,
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_FLUSH_PARAM,
         OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY,
-        OMConfigKeys.OZONE_OM_HA_PREFIX
+        OMConfigKeys.OZONE_OM_HA_PREFIX,
+        OMConfigKeys.OZONE_OM_ADMIN_PROTOCOL_MAX_RETRIES_KEY,

Review comment:
       I think it would be good to add in ozone-default.xml.
   So that users knows about these configs, otherwise now they need to know only by reading code.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r719275813



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1373,6 +1411,64 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMMetadata remoteOMMetadata = omMetadataProtocolClient.getOMMetadata();
+        boolean exists = checkOMexistsInRemoteOMConfig(remoteOMMetadata);
+        if (!exists) {
+          LOG.error("Remote OM " + remoteNodeId + ":" +
+              remoteNodeDetails.getHostAddress() + " does not have the " +
+              "bootstrapping OM(" + getOMNodeId() + ") information on " +
+              "reloading configs.");
+          omsWihtoutNewConfig.add(remoteNodeDetails);
+        }
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check before bootstrap failed on OM {}",
+            remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      StringBuilder errorMsgBuilder = new StringBuilder();
+      errorMsgBuilder.append("OM(s) [")
+          .append(omsWihtoutNewConfig.get(0).getOMPrintInfo());
+      for (int i = 1; i < omsWihtoutNewConfig.size(); i++) {
+        errorMsgBuilder.append(",")
+            .append(omsWihtoutNewConfig.get(i).getOMPrintInfo());
+      }
+      errorMsgBuilder.append("] do not have the bootstrapping OM information." +
+          " Update their ozone-site.xml with new node details before " +
+          "proceeding.");
+      shutdown(errorMsgBuilder.toString());
+    }
+  }
+
+  /**
+   * Check whether current OM information exists in the remote OM's reloaded
+   * configs.
+   */
+  private boolean checkOMexistsInRemoteOMConfig(OMMetadata remoteOMMetadata) {
+    List<OMNodeDetails> omNodesInNewConf = remoteOMMetadata

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r747129541



##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMConfigurationRequest {
+}
+
+message OMConfigurationResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;
+    required uint32 rpcPort = 3;
+    required uint32 ratisPort = 4;

Review comment:
       @hanishakoneru As discussed offline, we can do this a new Jira and also comparing remote OM config with local config to check if the new OM has all the older OM config. Pls open a new Jira for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru edited a comment on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru edited a comment on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-967285312


   Thanks @bharatviswa504 for the review. 
   I have opened HDDS-5978 to add more checks to the bootstrap process.
   
   > As this verify configuration is InterOM only right, not external client interaction. And we can add to OmClientProtocol that API right, do we need a new protocol for admin operations. And in OM server we can check whether admin has issued the command or not correct, Let me know if i am missing something here.
   
   It would be better to keep the admin protocols separate from client protocols. RemoveOM operation is a critical operation and I don't think it should be exposed to clients at all. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r696393594



##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMMetadataInfoRequest {
+}
+
+message OMMetadataInfoResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;

Review comment:
       Just a question dont we need to worry about other configs like http/https addr?
   I see when 
   
   ```
       String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
           omServiceId, omNodeId);
       String rpcAddrStr = OmUtils.getOmRpcAddress(conf, rpcAddrKey);
       if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
         return null;
       }
   
         if (newOMNodeDetails == null) {
           // If new node information is not present in the newly loaded
           // configuration also, throw an exception
           throw new OzoneIllegalArgumentException("There is no OM configuration "
               + "for node ID " + newOMNodeId + " in ozone-site.xml.");
         }
   ```
   so if rpcAddr is null, we throw IllegalArgumentException. I believe that is causing crash here. With current way we check if rpc address is set for new om node id on existing OM's, but if other configs are not set like http/https addr we fail during download checkpoint right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r694344092



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadataProtocol.java
##########
@@ -0,0 +1,36 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.security.KerberosInfo;
+
+/**
+ * Protocol for retrieving OM metadata information.
+ */
+@KerberosInfo(
+    serverPrincipal = OMConfigKeys.OZONE_OM_KERBEROS_PRINCIPAL_KEY)
+public interface OMMetadataProtocol extends Closeable {

Review comment:
       In future, if there is a need, this protocol can be reused for other meta information about the OM.
   
   How about renaming the RPC request instead from `getOMMetadataInfo` to `getOMConfiguration`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r694341747



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadata.java
##########
@@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+
+/**
+ * Class storing the OM metadata such as the node details in memory and node
+ * details when config is reloaded from disk.
+ * Note that this class is used as a structure to transfer the OM node
+ * information through the {@link OMMetadataProtocol} and not for storing the
+ * metadata information in OzoneManager itself.
+ */
+public final class OMMetadata {

Review comment:
       We have already defined OMNodeDetails and OMNodeInfo. What do you think of OMConfiguration (similar to RaftConfiguration)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r696393594



##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMMetadataInfoRequest {
+}
+
+message OMMetadataInfoResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;

Review comment:
       Just a question dont we need to worry about other configs like http/https addr?
   I see when 
   
   ```
       String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
           omServiceId, omNodeId);
       String rpcAddrStr = OmUtils.getOmRpcAddress(conf, rpcAddrKey);
       if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
         return null;
       }
   
         if (newOMNodeDetails == null) {
           // If new node information is not present in the newly loaded
           // configuration also, throw an exception
           throw new OzoneIllegalArgumentException("There is no OM configuration "
               + "for node ID " + newOMNodeId + " in ozone-site.xml.");
         }
   ```
   so if rpcAddr is null, we throw IllegalArgumentException. I believe that is causing crash here. With current way we check if rpc address is set for new om node id on existing OM's, but if other configs are not set like http/https addr we fail during download checkpoint right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r748670775



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OMAdminProtocolClientSideImpl.java
##########
@@ -35,46 +35,45 @@
 import org.apache.hadoop.ozone.om.OMConfigKeys;
 import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
 import org.apache.hadoop.ozone.om.protocol.OMConfiguration;
-import org.apache.hadoop.ozone.om.protocol.OMMetadataProtocol;
-import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationRequest;
-import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationResponse;
-import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+import org.apache.hadoop.ozone.om.protocol.OMAdminProtocol;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerAdminProtocolProtos.OMConfigurationRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerAdminProtocolProtos.OMConfigurationResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerAdminProtocolProtos.OMNodeInfo;
 import org.apache.hadoop.security.UserGroupInformation;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 /**
  * Protocol implementation for getting OM metadata information.

Review comment:
       Minor: JavaDoc needs a change.

##########
File path: hadoop-ozone/interface-client/src/main/proto/OMAdminProtocol.proto
##########
@@ -57,7 +57,7 @@ message OMNodeInfo {
 /**
  The service for getting OM metadata.

Review comment:
       Minor: Can we change doc to have something like service for OM admin operations??

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
##########
@@ -100,7 +100,9 @@ private void addPropertiesNotInXml() {
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_INTERVAL_DELAY,
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_FLUSH_PARAM,
         OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY,
-        OMConfigKeys.OZONE_OM_HA_PREFIX
+        OMConfigKeys.OZONE_OM_HA_PREFIX,
+        OMConfigKeys.OZONE_OM_ADMIN_PROTOCOL_MAX_RETRIES_KEY,

Review comment:
       Question: Any reason for not adding these properties to ozone-default.xml?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-967291551


   > > As this verify configuration is InterOM only right, not external client interaction. And we can add to OmClientProtocol that API right, do we need a new protocol for admin operations. And in OM server we can check whether admin has issued the command or not correct, Let me know if i am missing something here.
   > 
   > It would be better to keep the admin protocols separate from client protocols. RemoveOM operation is a critical operation and I don't think it should be exposed to clients at all.
   
   But the OmMetadatProtocol, is used between inter OM's if i understand correctly, the admins/clients will not interact with this.
   What is the advantage of the seperating admin protocol, as when it comes to implementation from client this will be exposed from admin command like ozone admin, and we check at server if user is admin or not. 
   Do we see any benifits of having a seperate protocol for admin operations? And also as said this protocol which is added is again used between OM's, so I donot understand of linking between admin here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r741506925



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OMMetadataProtocolClientSideImpl.java
##########
@@ -0,0 +1,136 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.RetryPolicies;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryProxy;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.ozone.OmUtils;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocol.OMConfiguration;
+import org.apache.hadoop.ozone.om.protocol.OMMetadataProtocol;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Protocol implementation for getting OM metadata information.
+ */
+public class OMMetadataProtocolClientSideImpl implements
+    OMMetadataProtocol {
+
+  /**
+   * RpcController is not used and hence is set to null.
+   */
+  private static final RpcController NULL_RPC_CONTROLLER = null;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(OMMetadataProtocolClientSideImpl.class);
+
+  private final String omNodeID;
+  private final OMMetadataProtocolPB rpcProxy;
+
+  public OMMetadataProtocolClientSideImpl(ConfigurationSource conf,
+      UserGroupInformation ugi, String omNodeId, InetSocketAddress omAddress)
+      throws IOException {
+
+    RPC.setProtocolEngine(OzoneConfiguration.of(conf),
+        OMMetadataProtocolPB.class, ProtobufRpcEngine.class);
+
+    this.omNodeID = omNodeId;
+
+    int maxRetries = conf.getInt(
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_MAX_RETRIES_KEY,
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_MAX_RETRIES_DEFAULT);
+    long waitBetweenRetries = conf.getLong(
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_WAIT_BETWEEN_RETRIES_KEY,
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_WAIT_BETWEEN_RETRIES_DEFAULT);
+
+    // OM metadata is requested from a specific OM and hence there is no need
+    // of any failover provider.
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .retryUpToMaximumCountWithFixedSleep(maxRetries, waitBetweenRetries,
+            TimeUnit.MILLISECONDS);
+    Configuration hadoopConf = LegacyHadoopConfigurationSource
+        .asHadoopConfiguration(conf);
+
+    OMMetadataProtocolPB proxy = RPC.getProtocolProxy(
+        OMMetadataProtocolPB.class,
+        RPC.getProtocolVersion(OMMetadataProtocolPB.class), omAddress, ugi,
+        hadoopConf, NetUtils.getDefaultSocketFactory(hadoopConf),
+        (int) OmUtils.getOMClientRpcTimeOut(conf), connectionRetryPolicy)
+        .getProxy();
+
+    RetryPolicy retryPolicy = RetryPolicies.retryUpToMaximumCountWithFixedSleep(
+        10, 1000, TimeUnit.MILLISECONDS);
+
+    this.rpcProxy = (OMMetadataProtocolPB) RetryProxy.create(
+        OMMetadataProtocolPB.class, proxy, retryPolicy);
+  }
+
+  @Override
+  public OMConfiguration getOMConfiguration() throws IOException {
+    try {
+      OMConfigurationResponse getConfigResponse = rpcProxy.getOMConfiguration(
+          NULL_RPC_CONTROLLER, OMConfigurationRequest.newBuilder().build());
+
+      OMConfiguration.Builder omMedatataBuilder = new OMConfiguration.Builder();
+      if (getConfigResponse.getSuccess()) {
+        if (getConfigResponse.getNodesInMemoryCount() > 0) {
+          for (OMNodeInfo omNodeInfo :
+              getConfigResponse.getNodesInMemoryList()) {
+            omMedatataBuilder.addToNodesInMemory(
+                OMNodeDetails.getFromProtobuf(omNodeInfo));
+          }
+        }
+        if (getConfigResponse.getNodesInNewConfCount() > 0) {
+          for (OMNodeInfo omNodeInfo :
+              getConfigResponse.getNodesInNewConfList()) {
+            omMedatataBuilder.addToNodesInNewConf(
+                OMNodeDetails.getFromProtobuf(omNodeInfo));
+          }
+        }
+      }
+      return omMedatataBuilder.build();
+    } catch (ServiceException e) {
+      LOG.error("Failed to retrieve configuration of OM {}", omNodeID, e);

Review comment:
       Can we add address also here, which will help during debugging issues.

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1531,6 +1650,18 @@ public boolean doesPeerExist(String omNodeId) {
     return false;
   }
 
+  public List<OMNodeDetails> getAllOMNodesInMemory() {

Review comment:
       Missing JavaDoc

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1531,6 +1650,18 @@ public boolean doesPeerExist(String omNodeId) {
     return false;
   }
 
+  public List<OMNodeDetails> getAllOMNodesInMemory() {
+    List<OMNodeDetails> peerNodes = getPeerNodes();
+    // Add current node also to list
+    peerNodes.add(omNodeDetails);
+    return peerNodes;
+  }
+
+  public List<OMNodeDetails> getAllOMNodesInNewConf() {

Review comment:
       Missing JavaDoc

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerBootstrap.java
##########
@@ -205,12 +191,146 @@ public void testLeaderChangeToNewOM() throws Exception {
         "other OMs are down", newOMNodeIds.contains(omLeader.getOMNodeId()));
 
     // Perform some read and write operations with new OM leader
-    objectStore = OzoneClientFactory.getRpcClient(OM_SERVICE_ID, conf)
-        .getObjectStore();
+    objectStore = OzoneClientFactory.getRpcClient(OM_SERVICE_ID,
+        cluster.getConf()).getObjectStore();
+
     OzoneVolume volume = objectStore.getVolume(VOLUME_NAME);
     OzoneBucket bucket = volume.getBucket(BUCKET_NAME);
     String key = createKey(bucket);
 
     Assert.assertNotNull(bucket.getKey(key));
   }
+
+  /**
+   * Tests the following scenarios:
+   * 1. Bootstrap without updating config on any existing OM -> fail
+   * 2. Force bootstrap without upating config on any OM -> fail
+   */
+  @Test
+  public void testBootstrapWithoutConfigUpdate() throws Exception {
+    // Setup 1 node cluster
+    setupCluster(1);
+    cluster.setupExitManagerForTesting();
+    OzoneManager existingOM = cluster.getOzoneManager(0);
+    String existingOMNodeId = existingOM.getOMNodeId();
+
+    GenericTestUtils.LogCapturer omLog =
+        GenericTestUtils.LogCapturer.captureLogs(OzoneManager.LOG);
+    GenericTestUtils.LogCapturer miniOzoneClusterLog =
+        GenericTestUtils.LogCapturer.captureLogs(MiniOzoneHAClusterImpl.LOG);
+
+    /***************************************************************************
+     * 1. Bootstrap without updating config on any existing OM -> fail
+     **************************************************************************/
+
+    // Bootstrap a new node without updating the configs on existing OMs.
+    // This should result in the bootstrap failing.
+    String newNodeId = "omNode-bootstrap-1";
+    try {
+      cluster.bootstrapOzoneManager(newNodeId, false, false);
+      Assert.fail("Bootstrap should have failed as configs are not updated on" +
+          " all OMs.");
+    } catch (Exception e) {
+      Assert.assertEquals(OmUtils.getOMAddressListPrintString(
+          Lists.newArrayList(existingOM.getNodeDetails())) + " do not have or" +
+          " have incorrect information of the bootstrapping OM. Update their " +
+          "ozone-site.xml before proceeding.", e.getMessage());
+      Assert.assertTrue(omLog.getOutput().contains("Remote OM config check " +
+          "failed on OM " + existingOMNodeId));
+      Assert.assertTrue(miniOzoneClusterLog.getOutput().contains(newNodeId +
+          " - System Exit"));
+    }
+
+    /***************************************************************************
+     * 2. Force bootstrap without upating config on any OM -> fail

Review comment:
       Minor NIT: upating -> updating

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1648,7 +1779,7 @@ public long getRatisSnapshotIndex() throws IOException {
    * Stop service.
    */
   public void stop() {
-    LOG.info("Stopping Ozone Manager");
+    LOG.info("{}: Stopping Ozone Manager", getOMNodeId());

Review comment:
       Can we address also?

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OMMetadataProtocolServerSideImpl.java
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.OzoneManager;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocolPB.OMMetadataProtocolPB;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+
+/**
+ * This class is the server-side translator that forwards requests received on
+ * {@link OMMetadataProtocolPB} to the OMConfiguration server implementation.

Review comment:
       OMConfiguration -> OMMetadataProtocolServer??

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1423,6 +1466,66 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMConfiguration remoteOMConfiguration =
+            omMetadataProtocolClient.getOMConfiguration();
+        checkRemoteOMConfig(remoteNodeId, remoteOMConfiguration);
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check failed on OM {}", remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      String errorMsg = OmUtils.getOMAddressListPrintString(omsWihtoutNewConfig)
+          + " do not have or have incorrect information of the bootstrapping " +
+          "OM. Update their ozone-site.xml before proceeding.";
+      exitManager.exitSystem(1, errorMsg, LOG);
+    }
+  }
+
+  /**
+   * Verify that the remote OM configuration is updated for the bootstrapping
+   * OM.
+   */
+  private void checkRemoteOMConfig(String remoteNodeId,
+      OMConfiguration remoteOMConfig) throws IOException {
+    if (remoteOMConfig == null) {
+      throw new IOException("Remote OM " + remoteNodeId + " configuration " +
+          "returned null");
+    }
+
+    if (remoteOMConfig.getCurrentPeerList().contains(this.getOMNodeId())) {
+      throw new IOException("Remote OM " + remoteNodeId + " already contains " +
+          "bootstrapping OM(" + getOMNodeId() + ") in it's in memory peer " +

Review comment:
       Can we add error message little detail, what it means in it's memory peer list means here?

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMConfiguration.java
##########
@@ -0,0 +1,84 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.hdds.NodeDetails;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+
+/**
+ * Class storing the OM configuration information such as the node details in
+ * memory and node details when config is reloaded from disk.
+ * Note that this class is used as a structure to transfer the OM configuration
+ * information through the {@link OMMetadataProtocol} and not for storing the
+ * configuration information in OzoneManager itself.
+ */
+public final class OMConfiguration {
+
+  // OM nodes present in OM's memory
+  private List<OMNodeDetails> omNodesInMemory = new ArrayList<>();
+  // OM nodes reloaded from new config on disk
+  private List<OMNodeDetails> omNodesInNewConf = new ArrayList<>();
+
+  private OMConfiguration(List<OMNodeDetails> inMemoryNodeList,
+      List<OMNodeDetails> onDiskNodeList) {
+    this.omNodesInMemory.addAll(inMemoryNodeList);
+    this.omNodesInNewConf.addAll(onDiskNodeList);
+  }
+
+  /**
+   * OMConfiguration Builder class.
+   */
+  public static class Builder {
+    private List<OMNodeDetails> omNodesInMemory;
+    private List<OMNodeDetails> omNodesInNewConf;
+
+    public Builder() {
+      this.omNodesInMemory = new ArrayList<>();
+      this.omNodesInNewConf = new ArrayList<>();
+    }
+
+    public Builder addToNodesInMemory(OMNodeDetails nodeDetails) {
+      this.omNodesInMemory.add(nodeDetails);
+      return this;
+    }
+
+    public Builder addToNodesInNewConf(OMNodeDetails nodeDetails) {
+      this.omNodesInNewConf.add(nodeDetails);
+      return this;
+    }
+
+    public OMConfiguration build() {
+      return new OMConfiguration(omNodesInMemory, omNodesInNewConf);
+    }
+  }
+
+  public List<String> getCurrentPeerList() {

Review comment:
       Missing JavaDocs

##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMConfigurationRequest {
+}
+
+message OMConfigurationResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;
+    required uint32 rpcPort = 3;
+    required uint32 ratisPort = 4;

Review comment:
       Any reason for adding ratisport, I see it is not checked?
   And also we donot check about httpaddress? 
   
   Is the intention here is to check if already started OM has this new OM address, and OM Ratis port only. Because if http address is not there during bootstrap we will fail correct?

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java
##########
@@ -697,4 +701,78 @@ public static String normalizeKey(String keyName,
     }
     return omHosts;
   }
+
+  /**
+   * Get a list of all OM details (address and ports) from the specified config.
+   */
+  public static List<OMNodeDetails> getAllOMAddresses(OzoneConfiguration conf,
+      String omServiceId, String currentOMNodeId) {
+
+    List<OMNodeDetails> omNodesList = new ArrayList<>();
+    Collection<String> omNodeIds = OmUtils.getOMNodeIds(conf, omServiceId);
+
+    String rpcAddrStr, hostAddr, httpAddr, httpsAddr;
+    int rpcPort, ratisPort;
+    if (omNodeIds.size() == 0) {
+      //Check if it is non-HA cluster
+      rpcAddrStr = OmUtils.getOmRpcAddress(conf, OZONE_OM_ADDRESS_KEY);
+      if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
+        return omNodesList;
+      }
+      hostAddr = HddsUtils.getHostName(rpcAddrStr).orElse(null);
+      rpcPort = HddsUtils.getHostPort(rpcAddrStr).orElse(0);
+      ratisPort = conf.getInt(OZONE_OM_RATIS_PORT_KEY,
+          OZONE_OM_RATIS_PORT_DEFAULT);
+      httpAddr = OmUtils.getHttpAddressForOMPeerNode(conf,
+          null, null, hostAddr);
+      httpsAddr = OmUtils.getHttpsAddressForOMPeerNode(conf,
+          null, null, hostAddr);
+
+      omNodesList.add(new OMNodeDetails.Builder()
+          .setOMNodeId(currentOMNodeId)
+          .setHostAddress(hostAddr)
+          .setRpcPort(rpcPort)
+          .setRatisPort(ratisPort)
+          .setHttpAddress(httpAddr)
+          .setHttpsAddress(httpsAddr)
+          .build());
+      return omNodesList;
+    }
+
+    for (String nodeId : omNodeIds) {
+      try {
+        OMNodeDetails omNodeDetails = OMNodeDetails.getOMNodeDetailsFromConf(
+            conf, omServiceId, nodeId);
+        omNodesList.add(omNodeDetails);
+      } catch (IOException e) {
+        String omRpcAddressStr = OMNodeDetails.getOMNodeAddressFromConf(conf,
+            omServiceId, nodeId);
+        LOG.error("OM {} is present in config file but it's address {} could " +
+            "not be resolved. Hence, OM {} is not added to list of peer nodes.",
+            nodeId, omRpcAddressStr, nodeId);
+      }
+    }
+
+    return omNodesList;
+  }
+
+  public static String getOMAddressListPrintString(List<OMNodeDetails> omList) {

Review comment:
       Missing JavaDoc

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1453,7 +1559,13 @@ public void updatePeerList(List<String> omNodeIds) {
       // Check if the OM NodeID is already present in the peer list or its
       // the local NodeID.
       if (!peerNodesMap.containsKey(omNodeId) && !isCurrentNode(omNodeId)) {
-        addOMNodeToPeers(omNodeId);
+        try {
+          addOMNodeToPeers(omNodeId);
+        } catch (IOException e) {
+          LOG.error("Fatal Error: Shutting down the system as otherwise it " +

Review comment:
       Question to understand: Why OM's will diverge? And also why are we shutting down OM here?
   
   From PR description, i thought if config is not updated on existing OM's we are shutting down, but shutdown if new config does not have that node id is being done as part of this Jira.

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadataProtocol.java
##########
@@ -0,0 +1,36 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.security.KerberosInfo;
+
+/**
+ * Protocol for retrieving OM metadata information.
+ */
+@KerberosInfo(
+    serverPrincipal = OMConfigKeys.OZONE_OM_KERBEROS_PRINCIPAL_KEY)
+public interface OMMetadataProtocol extends Closeable {

Review comment:
       Just a question, why cant we add this to OMInterServiceProtocol?
   Any reason to add it as a fresh OMMetadataProtocol?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-965903078


   > Thanks @bharatviswa504 for the review. I have addressed your comments in the new patch or answered inline.
   > 
   > > Just a question, why cant we add this to OMInterServiceProtocol?
   > > Any reason to add it as a fresh OMMetadataProtocol?
   > 
   > Added a new protocol so that it can be used for removeOM admin command also. RemoveOM would have to be an admin command and not from other OMs. So we need a new protocol for admins to perform operations on the OM.
   
   As this verify configuration is InterOM only right, not external client interaction. And we can add to OmClientProtocol that API right, do we need a new protocol for admin operations. And in OM server we can check whether admin has issued the command or not correct, Let me know if i am missing something here.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-965904278


   Thank You @hanishakoneru for the update.
   Overall LGTM, one question on the new protocol thing, commented inline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r696344774



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadataProtocol.java
##########
@@ -0,0 +1,36 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.security.KerberosInfo;
+
+/**
+ * Protocol for retrieving OM metadata information.
+ */
+@KerberosInfo(
+    serverPrincipal = OMConfigKeys.OZONE_OM_KERBEROS_PRINCIPAL_KEY)
+public interface OMMetadataProtocol extends Closeable {

Review comment:
       Makes sense. +1 for getOMConfiguration




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 edited a comment on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 edited a comment on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-906236680


   Thank You @hanishakoneru for this improvement.
   Left few comments, over all idea seems good to me.
   
   Need some tests to test positive and negative scenarios of this config updation. Can you add some tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-906236680


   Thank You @hanishakoneru for this improvement.
   Left few comments, over all idea seems good to me.
   
   Need some tests to test positive and negative scenarios


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r696393594



##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMMetadataInfoRequest {
+}
+
+message OMMetadataInfoResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;

Review comment:
       Just a question dont we need to worry about other configs like http/https addr?
   I see when 
   
   ```
       String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
           omServiceId, omNodeId);
       String rpcAddrStr = OmUtils.getOmRpcAddress(conf, rpcAddrKey);
       if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
         return null;
       }
   
         if (newOMNodeDetails == null) {
           // If new node information is not present in the newly loaded
           // configuration also, throw an exception
           throw new OzoneIllegalArgumentException("There is no OM configuration "
               + "for node ID " + newOMNodeId + " in ozone-site.xml.");
         }
   ```
   so if rpcAddr is null, we throw IllegalArgumentException. I believe that is causing crash here. With current way we check if rpc address is set for new om node id on existing OM's




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-963665745


   Thanks @bharatviswa504 for the review. I have addressed your comments in the new patch or answered inline. 
   
   > Just a question, why cant we add this to OMInterServiceProtocol?
   Any reason to add it as a fresh OMMetadataProtocol?
   Added a new protocol so that it can be used for removeOM admin command also. RemoveOM would have to be an admin command and not from other OMs. So we need a new protocol for admins to perform operations on the OM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r719071042



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1373,6 +1411,64 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMMetadata remoteOMMetadata = omMetadataProtocolClient.getOMMetadata();
+        boolean exists = checkOMexistsInRemoteOMConfig(remoteOMMetadata);
+        if (!exists) {
+          LOG.error("Remote OM " + remoteNodeId + ":" +
+              remoteNodeDetails.getHostAddress() + " does not have the " +
+              "bootstrapping OM(" + getOMNodeId() + ") information on " +
+              "reloading configs.");
+          omsWihtoutNewConfig.add(remoteNodeDetails);
+        }
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check before bootstrap failed on OM {}",
+            remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      StringBuilder errorMsgBuilder = new StringBuilder();
+      errorMsgBuilder.append("OM(s) [")
+          .append(omsWihtoutNewConfig.get(0).getOMPrintInfo());
+      for (int i = 1; i < omsWihtoutNewConfig.size(); i++) {
+        errorMsgBuilder.append(",")
+            .append(omsWihtoutNewConfig.get(i).getOMPrintInfo());
+      }
+      errorMsgBuilder.append("] do not have the bootstrapping OM information." +
+          " Update their ozone-site.xml with new node details before " +
+          "proceeding.");
+      shutdown(errorMsgBuilder.toString());
+    }
+  }
+
+  /**
+   * Check whether current OM information exists in the remote OM's reloaded
+   * configs.
+   */
+  private boolean checkOMexistsInRemoteOMConfig(OMMetadata remoteOMMetadata) {
+    List<OMNodeDetails> omNodesInNewConf = remoteOMMetadata
+        .getOmNodesInNewConf();
+    for (OMNodeDetails omNodeInRemoteOM : omNodesInNewConf) {

Review comment:
       There are 2 extra checks which I wanted to add but forgot.
   1. New OM is not already present as part of the Ratis ring
   2. A nodeId is not being reused.
   
   Also, we can use this method when removing an OM to verify that the OMs are in sync.
   But I see your point. Let me test what the implications would be for the 2 scenarios mentioned above. If its harmless and results in the bootstrapping node to shutdown, we should be fine. Otherwise, I will add those checks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru merged pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru merged pull request #2491:
URL: https://github.com/apache/ozone/pull/2491


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r745163855



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1453,7 +1559,13 @@ public void updatePeerList(List<String> omNodeIds) {
       // Check if the OM NodeID is already present in the peer list or its
       // the local NodeID.
       if (!peerNodesMap.containsKey(omNodeId) && !isCurrentNode(omNodeId)) {
-        addOMNodeToPeers(omNodeId);
+        try {
+          addOMNodeToPeers(omNodeId);
+        } catch (IOException e) {
+          LOG.error("Fatal Error: Shutting down the system as otherwise it " +

Review comment:
       This function is called from the StateMachine. Any error in SM could be fatal as it diverges the OM states. Let's say  om1 and om2 update their configs about new nodes om4 and om5, but om3 does not as addOMNodeToPeers fails. This will lead to om3 being in a different config without having knowledge about om4 and om5. Now let's say om1 and om2 go down and om4 is the new leader. om3 cannot contact om4 to download a checkpoint if required as it does not know the addresses of om4.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-967292982


   From security perspective one I can think of is, we can add service level ACL's so that it can be rejected by Rpc Layer it self.
   But not sure this is that critical, as if we have checks for admin methods in server, the same can be achieved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-967293910


   >Thanks @bharatviswa504 for the review.
   >I have opened HDDS-5978 to add more checks to the bootstrap process.
   
   I have moved this Jira under HDDS-4330.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-938653013


   @bharatviswa504 I have added some unit tests and made improvements in the existing bootstrap tests as well. Please take a look when you get a chance. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r696399379



##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMMetadataInfoRequest {
+}
+
+message OMMetadataInfoResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;

Review comment:
       Even in normal code path, i see we donot fail for http Addr being not set with prevalidation. Actually if we fall back for http host based on RPC hostname we should be good here I believe. 
   
   (not related to this PR BTW, just found out when reviewing this PR)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-902906495


   @bharatviswa504 can you please take a look when you can. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-931199816


   Thank you @bharatviswa504 for reviewing this and for the comments. I have addressed them and made some improvements as discussed above in the latest patch. I am working on unit tests and will post an update soon. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-958448280


   Thank You @hanishakoneru for the work.
   Overall approach LGTM, few comments/questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r745165279



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java
##########
@@ -697,4 +701,78 @@ public static String normalizeKey(String keyName,
     }
     return omHosts;
   }
+
+  /**
+   * Get a list of all OM details (address and ports) from the specified config.
+   */
+  public static List<OMNodeDetails> getAllOMAddresses(OzoneConfiguration conf,
+      String omServiceId, String currentOMNodeId) {
+
+    List<OMNodeDetails> omNodesList = new ArrayList<>();
+    Collection<String> omNodeIds = OmUtils.getOMNodeIds(conf, omServiceId);
+
+    String rpcAddrStr, hostAddr, httpAddr, httpsAddr;
+    int rpcPort, ratisPort;
+    if (omNodeIds.size() == 0) {
+      //Check if it is non-HA cluster
+      rpcAddrStr = OmUtils.getOmRpcAddress(conf, OZONE_OM_ADDRESS_KEY);
+      if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
+        return omNodesList;
+      }
+      hostAddr = HddsUtils.getHostName(rpcAddrStr).orElse(null);
+      rpcPort = HddsUtils.getHostPort(rpcAddrStr).orElse(0);
+      ratisPort = conf.getInt(OZONE_OM_RATIS_PORT_KEY,
+          OZONE_OM_RATIS_PORT_DEFAULT);
+      httpAddr = OmUtils.getHttpAddressForOMPeerNode(conf,
+          null, null, hostAddr);
+      httpsAddr = OmUtils.getHttpsAddressForOMPeerNode(conf,
+          null, null, hostAddr);
+
+      omNodesList.add(new OMNodeDetails.Builder()
+          .setOMNodeId(currentOMNodeId)
+          .setHostAddress(hostAddr)
+          .setRpcPort(rpcPort)
+          .setRatisPort(ratisPort)
+          .setHttpAddress(httpAddr)
+          .setHttpsAddress(httpsAddr)
+          .build());
+      return omNodesList;
+    }
+
+    for (String nodeId : omNodeIds) {
+      try {
+        OMNodeDetails omNodeDetails = OMNodeDetails.getOMNodeDetailsFromConf(
+            conf, omServiceId, nodeId);
+        omNodesList.add(omNodeDetails);
+      } catch (IOException e) {
+        String omRpcAddressStr = OMNodeDetails.getOMNodeAddressFromConf(conf,
+            omServiceId, nodeId);
+        LOG.error("OM {} is present in config file but it's address {} could " +
+            "not be resolved. Hence, OM {} is not added to list of peer nodes.",
+            nodeId, omRpcAddressStr, nodeId);
+      }
+    }
+
+    return omNodesList;
+  }
+
+  public static String getOMAddressListPrintString(List<OMNodeDetails> omList) {

Review comment:
       Done.

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OMMetadataProtocolClientSideImpl.java
##########
@@ -0,0 +1,136 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.RetryPolicies;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryProxy;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.ozone.OmUtils;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocol.OMConfiguration;
+import org.apache.hadoop.ozone.om.protocol.OMMetadataProtocol;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Protocol implementation for getting OM metadata information.
+ */
+public class OMMetadataProtocolClientSideImpl implements
+    OMMetadataProtocol {
+
+  /**
+   * RpcController is not used and hence is set to null.
+   */
+  private static final RpcController NULL_RPC_CONTROLLER = null;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(OMMetadataProtocolClientSideImpl.class);
+
+  private final String omNodeID;
+  private final OMMetadataProtocolPB rpcProxy;
+
+  public OMMetadataProtocolClientSideImpl(ConfigurationSource conf,
+      UserGroupInformation ugi, String omNodeId, InetSocketAddress omAddress)
+      throws IOException {
+
+    RPC.setProtocolEngine(OzoneConfiguration.of(conf),
+        OMMetadataProtocolPB.class, ProtobufRpcEngine.class);
+
+    this.omNodeID = omNodeId;
+
+    int maxRetries = conf.getInt(
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_MAX_RETRIES_KEY,
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_MAX_RETRIES_DEFAULT);
+    long waitBetweenRetries = conf.getLong(
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_WAIT_BETWEEN_RETRIES_KEY,
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_WAIT_BETWEEN_RETRIES_DEFAULT);
+
+    // OM metadata is requested from a specific OM and hence there is no need
+    // of any failover provider.
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .retryUpToMaximumCountWithFixedSleep(maxRetries, waitBetweenRetries,
+            TimeUnit.MILLISECONDS);
+    Configuration hadoopConf = LegacyHadoopConfigurationSource
+        .asHadoopConfiguration(conf);
+
+    OMMetadataProtocolPB proxy = RPC.getProtocolProxy(
+        OMMetadataProtocolPB.class,
+        RPC.getProtocolVersion(OMMetadataProtocolPB.class), omAddress, ugi,
+        hadoopConf, NetUtils.getDefaultSocketFactory(hadoopConf),
+        (int) OmUtils.getOMClientRpcTimeOut(conf), connectionRetryPolicy)
+        .getProxy();
+
+    RetryPolicy retryPolicy = RetryPolicies.retryUpToMaximumCountWithFixedSleep(
+        10, 1000, TimeUnit.MILLISECONDS);
+
+    this.rpcProxy = (OMMetadataProtocolPB) RetryProxy.create(
+        OMMetadataProtocolPB.class, proxy, retryPolicy);
+  }
+
+  @Override
+  public OMConfiguration getOMConfiguration() throws IOException {
+    try {
+      OMConfigurationResponse getConfigResponse = rpcProxy.getOMConfiguration(
+          NULL_RPC_CONTROLLER, OMConfigurationRequest.newBuilder().build());
+
+      OMConfiguration.Builder omMedatataBuilder = new OMConfiguration.Builder();
+      if (getConfigResponse.getSuccess()) {
+        if (getConfigResponse.getNodesInMemoryCount() > 0) {
+          for (OMNodeInfo omNodeInfo :
+              getConfigResponse.getNodesInMemoryList()) {
+            omMedatataBuilder.addToNodesInMemory(
+                OMNodeDetails.getFromProtobuf(omNodeInfo));
+          }
+        }
+        if (getConfigResponse.getNodesInNewConfCount() > 0) {
+          for (OMNodeInfo omNodeInfo :
+              getConfigResponse.getNodesInNewConfList()) {
+            omMedatataBuilder.addToNodesInNewConf(
+                OMNodeDetails.getFromProtobuf(omNodeInfo));
+          }
+        }
+      }
+      return omMedatataBuilder.build();
+    } catch (ServiceException e) {
+      LOG.error("Failed to retrieve configuration of OM {}", omNodeID, e);

Review comment:
       Done.

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OMMetadataProtocolServerSideImpl.java
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.OzoneManager;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocolPB.OMMetadataProtocolPB;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+
+/**
+ * This class is the server-side translator that forwards requests received on
+ * {@link OMMetadataProtocolPB} to the OMConfiguration server implementation.

Review comment:
       Done.

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1648,7 +1779,7 @@ public long getRatisSnapshotIndex() throws IOException {
    * Stop service.
    */
   public void stop() {
-    LOG.info("Stopping Ozone Manager");
+    LOG.info("{}: Stopping Ozone Manager", getOMNodeId());

Review comment:
       Done.

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1531,6 +1650,18 @@ public boolean doesPeerExist(String omNodeId) {
     return false;
   }
 
+  public List<OMNodeDetails> getAllOMNodesInMemory() {

Review comment:
       Done.

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1531,6 +1650,18 @@ public boolean doesPeerExist(String omNodeId) {
     return false;
   }
 
+  public List<OMNodeDetails> getAllOMNodesInMemory() {
+    List<OMNodeDetails> peerNodes = getPeerNodes();
+    // Add current node also to list
+    peerNodes.add(omNodeDetails);
+    return peerNodes;
+  }
+
+  public List<OMNodeDetails> getAllOMNodesInNewConf() {

Review comment:
       Done.

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1423,6 +1466,66 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMConfiguration remoteOMConfiguration =
+            omMetadataProtocolClient.getOMConfiguration();
+        checkRemoteOMConfig(remoteNodeId, remoteOMConfiguration);
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check failed on OM {}", remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      String errorMsg = OmUtils.getOMAddressListPrintString(omsWihtoutNewConfig)
+          + " do not have or have incorrect information of the bootstrapping " +
+          "OM. Update their ozone-site.xml before proceeding.";
+      exitManager.exitSystem(1, errorMsg, LOG);
+    }
+  }
+
+  /**
+   * Verify that the remote OM configuration is updated for the bootstrapping
+   * OM.
+   */
+  private void checkRemoteOMConfig(String remoteNodeId,
+      OMConfiguration remoteOMConfig) throws IOException {
+    if (remoteOMConfig == null) {
+      throw new IOException("Remote OM " + remoteNodeId + " configuration " +
+          "returned null");
+    }
+
+    if (remoteOMConfig.getCurrentPeerList().contains(this.getOMNodeId())) {
+      throw new IOException("Remote OM " + remoteNodeId + " already contains " +
+          "bootstrapping OM(" + getOMNodeId() + ") in it's in memory peer " +

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru edited a comment on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru edited a comment on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-963665745


   Thanks @bharatviswa504 for the review. I have addressed your comments in the new patch or answered inline. 
   
   > Just a question, why cant we add this to OMInterServiceProtocol?
   Any reason to add it as a fresh OMMetadataProtocol?
   
   Added a new protocol so that it can be used for removeOM admin command also. RemoveOM would have to be an admin command and not from other OMs. So we need a new protocol for admins to perform operations on the OM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r696421176



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1373,6 +1411,64 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMMetadata remoteOMMetadata = omMetadataProtocolClient.getOMMetadata();
+        boolean exists = checkOMexistsInRemoteOMConfig(remoteOMMetadata);
+        if (!exists) {
+          LOG.error("Remote OM " + remoteNodeId + ":" +
+              remoteNodeDetails.getHostAddress() + " does not have the " +
+              "bootstrapping OM(" + getOMNodeId() + ") information on " +
+              "reloading configs.");
+          omsWihtoutNewConfig.add(remoteNodeDetails);
+        }
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check before bootstrap failed on OM {}",
+            remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      StringBuilder errorMsgBuilder = new StringBuilder();
+      errorMsgBuilder.append("OM(s) [")
+          .append(omsWihtoutNewConfig.get(0).getOMPrintInfo());
+      for (int i = 1; i < omsWihtoutNewConfig.size(); i++) {
+        errorMsgBuilder.append(",")
+            .append(omsWihtoutNewConfig.get(i).getOMPrintInfo());
+      }
+      errorMsgBuilder.append("] do not have the bootstrapping OM information." +
+          " Update their ozone-site.xml with new node details before " +
+          "proceeding.");
+      shutdown(errorMsgBuilder.toString());
+    }
+  }
+
+  /**
+   * Check whether current OM information exists in the remote OM's reloaded
+   * configs.
+   */
+  private boolean checkOMexistsInRemoteOMConfig(OMMetadata remoteOMMetadata) {
+    List<OMNodeDetails> omNodesInNewConf = remoteOMMetadata
+        .getOmNodesInNewConf();
+    for (OMNodeDetails omNodeInRemoteOM : omNodesInNewConf) {

Review comment:
       If we need only reloaded conf, then why do we need to pass inMemory conf also?
   
   As we trust that on new node we believe it has old OM confs, as we use that and contact old OMs. 

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OMMetadataProtocolServerSideImpl.java
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.OzoneManager;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocolPB.OMMetadataProtocolPB;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+
+/**
+ * This class is the server-side translator that forwards requests received on
+ * {@link OMMetadataProtocolPB} to the OMMetadata server implementation.
+ */
+public class OMMetadataProtocolServerSideImpl implements OMMetadataProtocolPB {
+
+  private final OzoneManager ozoneManager;
+
+  public OMMetadataProtocolServerSideImpl(OzoneManager om) {
+    this.ozoneManager = om;
+  }
+
+  @Override
+  public OMMetadataInfoResponse getOMMetadataInfo(RpcController controller,
+      OMMetadataInfoRequest request) throws ServiceException {
+
+    List<OMNodeDetails> oldOMNodesList = ozoneManager.getAllOMNodesInMemory();
+    List<OMNodeDetails> newOMNodesList = ozoneManager.getAllOMNodesInNewConf();
+
+    List<OMNodeInfo> omNodesInMemory = new ArrayList<>(oldOMNodesList.size());
+    for (OMNodeDetails omNodeDetails : oldOMNodesList) {
+      omNodesInMemory.add(omNodeDetails.getProtobuf());
+    }
+
+    List<OMNodeInfo> omNodesInNewConf =
+        new ArrayList<>(newOMNodesList.size());
+    for (OMNodeDetails omNodeDetails : newOMNodesList) {
+      omNodesInNewConf.add(omNodeDetails.getProtobuf());

Review comment:
       Do we need to have not null check here?
   Lets say node id is present, but rpc addr for specific nodeid not found we return null.
   And when we add null to proto, we get NPE

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1373,6 +1411,64 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMMetadata remoteOMMetadata = omMetadataProtocolClient.getOMMetadata();
+        boolean exists = checkOMexistsInRemoteOMConfig(remoteOMMetadata);
+        if (!exists) {
+          LOG.error("Remote OM " + remoteNodeId + ":" +
+              remoteNodeDetails.getHostAddress() + " does not have the " +
+              "bootstrapping OM(" + getOMNodeId() + ") information on " +
+              "reloading configs.");
+          omsWihtoutNewConfig.add(remoteNodeDetails);
+        }
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check before bootstrap failed on OM {}",
+            remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      StringBuilder errorMsgBuilder = new StringBuilder();
+      errorMsgBuilder.append("OM(s) [")
+          .append(omsWihtoutNewConfig.get(0).getOMPrintInfo());
+      for (int i = 1; i < omsWihtoutNewConfig.size(); i++) {
+        errorMsgBuilder.append(",")
+            .append(omsWihtoutNewConfig.get(i).getOMPrintInfo());
+      }
+      errorMsgBuilder.append("] do not have the bootstrapping OM information." +
+          " Update their ozone-site.xml with new node details before " +
+          "proceeding.");
+      shutdown(errorMsgBuilder.toString());
+    }
+  }
+
+  /**
+   * Check whether current OM information exists in the remote OM's reloaded
+   * configs.
+   */
+  private boolean checkOMexistsInRemoteOMConfig(OMMetadata remoteOMMetadata) {
+    List<OMNodeDetails> omNodesInNewConf = remoteOMMetadata

Review comment:
       not null check missing here, as in exception case. we return null
   ```
    catch (ServiceException e) {
         LOG.error("Failed to retrieve metadata information of OM {}", omNodeID,
             e);
       }
       return null;
   ```

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1373,6 +1411,64 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMMetadata remoteOMMetadata = omMetadataProtocolClient.getOMMetadata();
+        boolean exists = checkOMexistsInRemoteOMConfig(remoteOMMetadata);
+        if (!exists) {
+          LOG.error("Remote OM " + remoteNodeId + ":" +
+              remoteNodeDetails.getHostAddress() + " does not have the " +
+              "bootstrapping OM(" + getOMNodeId() + ") information on " +
+              "reloading configs.");
+          omsWihtoutNewConfig.add(remoteNodeDetails);
+        }
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check before bootstrap failed on OM {}",
+            remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      StringBuilder errorMsgBuilder = new StringBuilder();
+      errorMsgBuilder.append("OM(s) [")
+          .append(omsWihtoutNewConfig.get(0).getOMPrintInfo());
+      for (int i = 1; i < omsWihtoutNewConfig.size(); i++) {
+        errorMsgBuilder.append(",")
+            .append(omsWihtoutNewConfig.get(i).getOMPrintInfo());
+      }
+      errorMsgBuilder.append("] do not have the bootstrapping OM information." +
+          " Update their ozone-site.xml with new node details before " +
+          "proceeding.");
+      shutdown(errorMsgBuilder.toString());
+    }
+  }
+
+  /**
+   * Check whether current OM information exists in the remote OM's reloaded
+   * configs.
+   */
+  private boolean checkOMexistsInRemoteOMConfig(OMMetadata remoteOMMetadata) {
+    List<OMNodeDetails> omNodesInNewConf = remoteOMMetadata
+        .getOmNodesInNewConf();
+    for (OMNodeDetails omNodeInRemoteOM : omNodesInNewConf) {

Review comment:
       And also to make it simple just passing new OM host node id, and returning that from existing OM's can also be done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r745167605



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1453,7 +1559,13 @@ public void updatePeerList(List<String> omNodeIds) {
       // Check if the OM NodeID is already present in the peer list or its
       // the local NodeID.
       if (!peerNodesMap.containsKey(omNodeId) && !isCurrentNode(omNodeId)) {
-        addOMNodeToPeers(omNodeId);
+        try {
+          addOMNodeToPeers(omNodeId);
+        } catch (IOException e) {
+          LOG.error("Fatal Error: Shutting down the system as otherwise it " +

Review comment:
       This function is called from OM SM. Any failure here could be fatal as OM states would diverge.
   For example, let's say OM1 and OM2 successfully update their configs with newly bootstrapped nodes OM4 and OM5. OM3 fails to do so. Let's say later OM1 and OM2 shutdown with OM4 as the new leader. OM3 would not be able to download checkpoint from OM4 as it is not aware of OM4 addresses.

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerBootstrap.java
##########
@@ -205,12 +191,146 @@ public void testLeaderChangeToNewOM() throws Exception {
         "other OMs are down", newOMNodeIds.contains(omLeader.getOMNodeId()));
 
     // Perform some read and write operations with new OM leader
-    objectStore = OzoneClientFactory.getRpcClient(OM_SERVICE_ID, conf)
-        .getObjectStore();
+    objectStore = OzoneClientFactory.getRpcClient(OM_SERVICE_ID,
+        cluster.getConf()).getObjectStore();
+
     OzoneVolume volume = objectStore.getVolume(VOLUME_NAME);
     OzoneBucket bucket = volume.getBucket(BUCKET_NAME);
     String key = createKey(bucket);
 
     Assert.assertNotNull(bucket.getKey(key));
   }
+
+  /**
+   * Tests the following scenarios:
+   * 1. Bootstrap without updating config on any existing OM -> fail
+   * 2. Force bootstrap without upating config on any OM -> fail
+   */
+  @Test
+  public void testBootstrapWithoutConfigUpdate() throws Exception {
+    // Setup 1 node cluster
+    setupCluster(1);
+    cluster.setupExitManagerForTesting();
+    OzoneManager existingOM = cluster.getOzoneManager(0);
+    String existingOMNodeId = existingOM.getOMNodeId();
+
+    GenericTestUtils.LogCapturer omLog =
+        GenericTestUtils.LogCapturer.captureLogs(OzoneManager.LOG);
+    GenericTestUtils.LogCapturer miniOzoneClusterLog =
+        GenericTestUtils.LogCapturer.captureLogs(MiniOzoneHAClusterImpl.LOG);
+
+    /***************************************************************************
+     * 1. Bootstrap without updating config on any existing OM -> fail
+     **************************************************************************/
+
+    // Bootstrap a new node without updating the configs on existing OMs.
+    // This should result in the bootstrap failing.
+    String newNodeId = "omNode-bootstrap-1";
+    try {
+      cluster.bootstrapOzoneManager(newNodeId, false, false);
+      Assert.fail("Bootstrap should have failed as configs are not updated on" +
+          " all OMs.");
+    } catch (Exception e) {
+      Assert.assertEquals(OmUtils.getOMAddressListPrintString(
+          Lists.newArrayList(existingOM.getNodeDetails())) + " do not have or" +
+          " have incorrect information of the bootstrapping OM. Update their " +
+          "ozone-site.xml before proceeding.", e.getMessage());
+      Assert.assertTrue(omLog.getOutput().contains("Remote OM config check " +
+          "failed on OM " + existingOMNodeId));
+      Assert.assertTrue(miniOzoneClusterLog.getOutput().contains(newNodeId +
+          " - System Exit"));
+    }
+
+    /***************************************************************************
+     * 2. Force bootstrap without upating config on any OM -> fail

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r745165171



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java
##########
@@ -697,4 +701,78 @@ public static String normalizeKey(String keyName,
     }
     return omHosts;
   }
+
+  /**
+   * Get a list of all OM details (address and ports) from the specified config.
+   */
+  public static List<OMNodeDetails> getAllOMAddresses(OzoneConfiguration conf,
+      String omServiceId, String currentOMNodeId) {
+
+    List<OMNodeDetails> omNodesList = new ArrayList<>();
+    Collection<String> omNodeIds = OmUtils.getOMNodeIds(conf, omServiceId);
+
+    String rpcAddrStr, hostAddr, httpAddr, httpsAddr;
+    int rpcPort, ratisPort;
+    if (omNodeIds.size() == 0) {
+      //Check if it is non-HA cluster
+      rpcAddrStr = OmUtils.getOmRpcAddress(conf, OZONE_OM_ADDRESS_KEY);
+      if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
+        return omNodesList;
+      }
+      hostAddr = HddsUtils.getHostName(rpcAddrStr).orElse(null);
+      rpcPort = HddsUtils.getHostPort(rpcAddrStr).orElse(0);
+      ratisPort = conf.getInt(OZONE_OM_RATIS_PORT_KEY,
+          OZONE_OM_RATIS_PORT_DEFAULT);
+      httpAddr = OmUtils.getHttpAddressForOMPeerNode(conf,
+          null, null, hostAddr);
+      httpsAddr = OmUtils.getHttpsAddressForOMPeerNode(conf,
+          null, null, hostAddr);
+
+      omNodesList.add(new OMNodeDetails.Builder()
+          .setOMNodeId(currentOMNodeId)
+          .setHostAddress(hostAddr)
+          .setRpcPort(rpcPort)
+          .setRatisPort(ratisPort)
+          .setHttpAddress(httpAddr)
+          .setHttpsAddress(httpsAddr)
+          .build());
+      return omNodesList;
+    }
+
+    for (String nodeId : omNodeIds) {
+      try {
+        OMNodeDetails omNodeDetails = OMNodeDetails.getOMNodeDetailsFromConf(
+            conf, omServiceId, nodeId);
+        omNodesList.add(omNodeDetails);
+      } catch (IOException e) {
+        String omRpcAddressStr = OMNodeDetails.getOMNodeAddressFromConf(conf,
+            omServiceId, nodeId);
+        LOG.error("OM {} is present in config file but it's address {} could " +
+            "not be resolved. Hence, OM {} is not added to list of peer nodes.",
+            nodeId, omRpcAddressStr, nodeId);
+      }
+    }
+
+    return omNodesList;
+  }
+
+  public static String getOMAddressListPrintString(List<OMNodeDetails> omList) {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-967712573


   @bharatviswa504 thanks for the offline discussion. I have updated to Metadata protocol name to Admin Protocol to better represent the purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r749654899



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
##########
@@ -100,7 +100,9 @@ private void addPropertiesNotInXml() {
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_INTERVAL_DELAY,
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_FLUSH_PARAM,
         OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY,
-        OMConfigKeys.OZONE_OM_HA_PREFIX
+        OMConfigKeys.OZONE_OM_HA_PREFIX,
+        OMConfigKeys.OZONE_OM_ADMIN_PROTOCOL_MAX_RETRIES_KEY,

Review comment:
       As these are just used for admin operations, I don't think tuning these would make much of a difference. It would still be configurable but just not exposed via ozone-default.xml.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r719125131



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OMMetadataProtocolServerSideImpl.java
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.OzoneManager;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocolPB.OMMetadataProtocolPB;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+
+/**
+ * This class is the server-side translator that forwards requests received on
+ * {@link OMMetadataProtocolPB} to the OMMetadata server implementation.
+ */
+public class OMMetadataProtocolServerSideImpl implements OMMetadataProtocolPB {
+
+  private final OzoneManager ozoneManager;
+
+  public OMMetadataProtocolServerSideImpl(OzoneManager om) {
+    this.ozoneManager = om;
+  }
+
+  @Override
+  public OMMetadataInfoResponse getOMMetadataInfo(RpcController controller,
+      OMMetadataInfoRequest request) throws ServiceException {
+
+    List<OMNodeDetails> oldOMNodesList = ozoneManager.getAllOMNodesInMemory();
+    List<OMNodeDetails> newOMNodesList = ozoneManager.getAllOMNodesInNewConf();
+
+    List<OMNodeInfo> omNodesInMemory = new ArrayList<>(oldOMNodesList.size());
+    for (OMNodeDetails omNodeDetails : oldOMNodesList) {
+      omNodesInMemory.add(omNodeDetails.getProtobuf());
+    }
+
+    List<OMNodeInfo> omNodesInNewConf =
+        new ArrayList<>(newOMNodesList.size());
+    for (OMNodeDetails omNodeDetails : newOMNodesList) {
+      omNodesInNewConf.add(omNodeDetails.getProtobuf());

Review comment:
       Neither getAllOMNodesInMemory() nor getAllOMNodesInNewConf() return null. They return an empty arraylist when there are no elements. So it should be ok as we will not be adding a null value to proto.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r719071042



##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1373,6 +1411,64 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMMetadata remoteOMMetadata = omMetadataProtocolClient.getOMMetadata();
+        boolean exists = checkOMexistsInRemoteOMConfig(remoteOMMetadata);
+        if (!exists) {
+          LOG.error("Remote OM " + remoteNodeId + ":" +
+              remoteNodeDetails.getHostAddress() + " does not have the " +
+              "bootstrapping OM(" + getOMNodeId() + ") information on " +
+              "reloading configs.");
+          omsWihtoutNewConfig.add(remoteNodeDetails);
+        }
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check before bootstrap failed on OM {}",
+            remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      StringBuilder errorMsgBuilder = new StringBuilder();
+      errorMsgBuilder.append("OM(s) [")
+          .append(omsWihtoutNewConfig.get(0).getOMPrintInfo());
+      for (int i = 1; i < omsWihtoutNewConfig.size(); i++) {
+        errorMsgBuilder.append(",")
+            .append(omsWihtoutNewConfig.get(i).getOMPrintInfo());
+      }
+      errorMsgBuilder.append("] do not have the bootstrapping OM information." +
+          " Update their ozone-site.xml with new node details before " +
+          "proceeding.");
+      shutdown(errorMsgBuilder.toString());
+    }
+  }
+
+  /**
+   * Check whether current OM information exists in the remote OM's reloaded
+   * configs.
+   */
+  private boolean checkOMexistsInRemoteOMConfig(OMMetadata remoteOMMetadata) {
+    List<OMNodeDetails> omNodesInNewConf = remoteOMMetadata
+        .getOmNodesInNewConf();
+    for (OMNodeDetails omNodeInRemoteOM : omNodesInNewConf) {

Review comment:
       There are 2 extra checks which I wanted to add but forgot.
   1. New OM is not already present as part of the Ratis ring
   2. A nodeId is not being reused.
   Also, we can use this method when removing an OM to verify that the OMs are in sync.
   But I see your point. Let me test what the implications would be for the 2 scenarios mentioned above. If its harmless and results in the bootstrapping node to shutdown, we should be fine. Otherwise, I will add those checks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r741506925



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OMMetadataProtocolClientSideImpl.java
##########
@@ -0,0 +1,136 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.RetryPolicies;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryProxy;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.ozone.OmUtils;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocol.OMConfiguration;
+import org.apache.hadoop.ozone.om.protocol.OMMetadataProtocol;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Protocol implementation for getting OM metadata information.
+ */
+public class OMMetadataProtocolClientSideImpl implements
+    OMMetadataProtocol {
+
+  /**
+   * RpcController is not used and hence is set to null.
+   */
+  private static final RpcController NULL_RPC_CONTROLLER = null;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(OMMetadataProtocolClientSideImpl.class);
+
+  private final String omNodeID;
+  private final OMMetadataProtocolPB rpcProxy;
+
+  public OMMetadataProtocolClientSideImpl(ConfigurationSource conf,
+      UserGroupInformation ugi, String omNodeId, InetSocketAddress omAddress)
+      throws IOException {
+
+    RPC.setProtocolEngine(OzoneConfiguration.of(conf),
+        OMMetadataProtocolPB.class, ProtobufRpcEngine.class);
+
+    this.omNodeID = omNodeId;
+
+    int maxRetries = conf.getInt(
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_MAX_RETRIES_KEY,
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_MAX_RETRIES_DEFAULT);
+    long waitBetweenRetries = conf.getLong(
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_WAIT_BETWEEN_RETRIES_KEY,
+        OMConfigKeys.OZONE_OM_METADATA_PROTOCOL_WAIT_BETWEEN_RETRIES_DEFAULT);
+
+    // OM metadata is requested from a specific OM and hence there is no need
+    // of any failover provider.
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .retryUpToMaximumCountWithFixedSleep(maxRetries, waitBetweenRetries,
+            TimeUnit.MILLISECONDS);
+    Configuration hadoopConf = LegacyHadoopConfigurationSource
+        .asHadoopConfiguration(conf);
+
+    OMMetadataProtocolPB proxy = RPC.getProtocolProxy(
+        OMMetadataProtocolPB.class,
+        RPC.getProtocolVersion(OMMetadataProtocolPB.class), omAddress, ugi,
+        hadoopConf, NetUtils.getDefaultSocketFactory(hadoopConf),
+        (int) OmUtils.getOMClientRpcTimeOut(conf), connectionRetryPolicy)
+        .getProxy();
+
+    RetryPolicy retryPolicy = RetryPolicies.retryUpToMaximumCountWithFixedSleep(
+        10, 1000, TimeUnit.MILLISECONDS);
+
+    this.rpcProxy = (OMMetadataProtocolPB) RetryProxy.create(
+        OMMetadataProtocolPB.class, proxy, retryPolicy);
+  }
+
+  @Override
+  public OMConfiguration getOMConfiguration() throws IOException {
+    try {
+      OMConfigurationResponse getConfigResponse = rpcProxy.getOMConfiguration(
+          NULL_RPC_CONTROLLER, OMConfigurationRequest.newBuilder().build());
+
+      OMConfiguration.Builder omMedatataBuilder = new OMConfiguration.Builder();
+      if (getConfigResponse.getSuccess()) {
+        if (getConfigResponse.getNodesInMemoryCount() > 0) {
+          for (OMNodeInfo omNodeInfo :
+              getConfigResponse.getNodesInMemoryList()) {
+            omMedatataBuilder.addToNodesInMemory(
+                OMNodeDetails.getFromProtobuf(omNodeInfo));
+          }
+        }
+        if (getConfigResponse.getNodesInNewConfCount() > 0) {
+          for (OMNodeInfo omNodeInfo :
+              getConfigResponse.getNodesInNewConfList()) {
+            omMedatataBuilder.addToNodesInNewConf(
+                OMNodeDetails.getFromProtobuf(omNodeInfo));
+          }
+        }
+      }
+      return omMedatataBuilder.build();
+    } catch (ServiceException e) {
+      LOG.error("Failed to retrieve configuration of OM {}", omNodeID, e);

Review comment:
       Can we add address also here, which will help during debugging issues.

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1531,6 +1650,18 @@ public boolean doesPeerExist(String omNodeId) {
     return false;
   }
 
+  public List<OMNodeDetails> getAllOMNodesInMemory() {

Review comment:
       Missing JavaDoc

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1531,6 +1650,18 @@ public boolean doesPeerExist(String omNodeId) {
     return false;
   }
 
+  public List<OMNodeDetails> getAllOMNodesInMemory() {
+    List<OMNodeDetails> peerNodes = getPeerNodes();
+    // Add current node also to list
+    peerNodes.add(omNodeDetails);
+    return peerNodes;
+  }
+
+  public List<OMNodeDetails> getAllOMNodesInNewConf() {

Review comment:
       Missing JavaDoc

##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOzoneManagerBootstrap.java
##########
@@ -205,12 +191,146 @@ public void testLeaderChangeToNewOM() throws Exception {
         "other OMs are down", newOMNodeIds.contains(omLeader.getOMNodeId()));
 
     // Perform some read and write operations with new OM leader
-    objectStore = OzoneClientFactory.getRpcClient(OM_SERVICE_ID, conf)
-        .getObjectStore();
+    objectStore = OzoneClientFactory.getRpcClient(OM_SERVICE_ID,
+        cluster.getConf()).getObjectStore();
+
     OzoneVolume volume = objectStore.getVolume(VOLUME_NAME);
     OzoneBucket bucket = volume.getBucket(BUCKET_NAME);
     String key = createKey(bucket);
 
     Assert.assertNotNull(bucket.getKey(key));
   }
+
+  /**
+   * Tests the following scenarios:
+   * 1. Bootstrap without updating config on any existing OM -> fail
+   * 2. Force bootstrap without upating config on any OM -> fail
+   */
+  @Test
+  public void testBootstrapWithoutConfigUpdate() throws Exception {
+    // Setup 1 node cluster
+    setupCluster(1);
+    cluster.setupExitManagerForTesting();
+    OzoneManager existingOM = cluster.getOzoneManager(0);
+    String existingOMNodeId = existingOM.getOMNodeId();
+
+    GenericTestUtils.LogCapturer omLog =
+        GenericTestUtils.LogCapturer.captureLogs(OzoneManager.LOG);
+    GenericTestUtils.LogCapturer miniOzoneClusterLog =
+        GenericTestUtils.LogCapturer.captureLogs(MiniOzoneHAClusterImpl.LOG);
+
+    /***************************************************************************
+     * 1. Bootstrap without updating config on any existing OM -> fail
+     **************************************************************************/
+
+    // Bootstrap a new node without updating the configs on existing OMs.
+    // This should result in the bootstrap failing.
+    String newNodeId = "omNode-bootstrap-1";
+    try {
+      cluster.bootstrapOzoneManager(newNodeId, false, false);
+      Assert.fail("Bootstrap should have failed as configs are not updated on" +
+          " all OMs.");
+    } catch (Exception e) {
+      Assert.assertEquals(OmUtils.getOMAddressListPrintString(
+          Lists.newArrayList(existingOM.getNodeDetails())) + " do not have or" +
+          " have incorrect information of the bootstrapping OM. Update their " +
+          "ozone-site.xml before proceeding.", e.getMessage());
+      Assert.assertTrue(omLog.getOutput().contains("Remote OM config check " +
+          "failed on OM " + existingOMNodeId));
+      Assert.assertTrue(miniOzoneClusterLog.getOutput().contains(newNodeId +
+          " - System Exit"));
+    }
+
+    /***************************************************************************
+     * 2. Force bootstrap without upating config on any OM -> fail

Review comment:
       Minor NIT: upating -> updating

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1648,7 +1779,7 @@ public long getRatisSnapshotIndex() throws IOException {
    * Stop service.
    */
   public void stop() {
-    LOG.info("Stopping Ozone Manager");
+    LOG.info("{}: Stopping Ozone Manager", getOMNodeId());

Review comment:
       Can we address also?

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/protocolPB/OMMetadataProtocolServerSideImpl.java
##########
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.OzoneManager;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocolPB.OMMetadataProtocolPB;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMConfigurationResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+
+/**
+ * This class is the server-side translator that forwards requests received on
+ * {@link OMMetadataProtocolPB} to the OMConfiguration server implementation.

Review comment:
       OMConfiguration -> OMMetadataProtocolServer??

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1423,6 +1466,66 @@ public void restart() throws IOException {
     omState = State.RUNNING;
   }
 
+  private void checkConfigBeforeBootstrap() throws IOException {
+    List<OMNodeDetails> omsWihtoutNewConfig = new ArrayList<>();
+    for (Map.Entry<String, OMNodeDetails> entry : peerNodesMap.entrySet()) {
+      String remoteNodeId = entry.getKey();
+      OMNodeDetails remoteNodeDetails = entry.getValue();
+      try (OMMetadataProtocolClientSideImpl omMetadataProtocolClient =
+               new OMMetadataProtocolClientSideImpl(configuration,
+                   getRemoteUser(), remoteNodeId,
+                   remoteNodeDetails.getRpcAddress())) {
+
+        OMConfiguration remoteOMConfiguration =
+            omMetadataProtocolClient.getOMConfiguration();
+        checkRemoteOMConfig(remoteNodeId, remoteOMConfiguration);
+      } catch (IOException ioe) {
+        LOG.error("Remote OM config check failed on OM {}", remoteNodeId, ioe);
+        omsWihtoutNewConfig.add(remoteNodeDetails);
+      }
+    }
+    if (!omsWihtoutNewConfig.isEmpty()) {
+      String errorMsg = OmUtils.getOMAddressListPrintString(omsWihtoutNewConfig)
+          + " do not have or have incorrect information of the bootstrapping " +
+          "OM. Update their ozone-site.xml before proceeding.";
+      exitManager.exitSystem(1, errorMsg, LOG);
+    }
+  }
+
+  /**
+   * Verify that the remote OM configuration is updated for the bootstrapping
+   * OM.
+   */
+  private void checkRemoteOMConfig(String remoteNodeId,
+      OMConfiguration remoteOMConfig) throws IOException {
+    if (remoteOMConfig == null) {
+      throw new IOException("Remote OM " + remoteNodeId + " configuration " +
+          "returned null");
+    }
+
+    if (remoteOMConfig.getCurrentPeerList().contains(this.getOMNodeId())) {
+      throw new IOException("Remote OM " + remoteNodeId + " already contains " +
+          "bootstrapping OM(" + getOMNodeId() + ") in it's in memory peer " +

Review comment:
       Can we add error message little detail, what it means in it's memory peer list means here?

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMConfiguration.java
##########
@@ -0,0 +1,84 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.hdds.NodeDetails;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+
+/**
+ * Class storing the OM configuration information such as the node details in
+ * memory and node details when config is reloaded from disk.
+ * Note that this class is used as a structure to transfer the OM configuration
+ * information through the {@link OMMetadataProtocol} and not for storing the
+ * configuration information in OzoneManager itself.
+ */
+public final class OMConfiguration {
+
+  // OM nodes present in OM's memory
+  private List<OMNodeDetails> omNodesInMemory = new ArrayList<>();
+  // OM nodes reloaded from new config on disk
+  private List<OMNodeDetails> omNodesInNewConf = new ArrayList<>();
+
+  private OMConfiguration(List<OMNodeDetails> inMemoryNodeList,
+      List<OMNodeDetails> onDiskNodeList) {
+    this.omNodesInMemory.addAll(inMemoryNodeList);
+    this.omNodesInNewConf.addAll(onDiskNodeList);
+  }
+
+  /**
+   * OMConfiguration Builder class.
+   */
+  public static class Builder {
+    private List<OMNodeDetails> omNodesInMemory;
+    private List<OMNodeDetails> omNodesInNewConf;
+
+    public Builder() {
+      this.omNodesInMemory = new ArrayList<>();
+      this.omNodesInNewConf = new ArrayList<>();
+    }
+
+    public Builder addToNodesInMemory(OMNodeDetails nodeDetails) {
+      this.omNodesInMemory.add(nodeDetails);
+      return this;
+    }
+
+    public Builder addToNodesInNewConf(OMNodeDetails nodeDetails) {
+      this.omNodesInNewConf.add(nodeDetails);
+      return this;
+    }
+
+    public OMConfiguration build() {
+      return new OMConfiguration(omNodesInMemory, omNodesInNewConf);
+    }
+  }
+
+  public List<String> getCurrentPeerList() {

Review comment:
       Missing JavaDocs

##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMConfigurationRequest {
+}
+
+message OMConfigurationResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;
+    required uint32 rpcPort = 3;
+    required uint32 ratisPort = 4;

Review comment:
       Any reason for adding ratisport, I see it is not checked?
   And also we donot check about httpaddress? 
   
   Is the intention here is to check if already started OM has this new OM address, and OM Ratis port only. Because if http address is not there during bootstrap we will fail correct?

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java
##########
@@ -697,4 +701,78 @@ public static String normalizeKey(String keyName,
     }
     return omHosts;
   }
+
+  /**
+   * Get a list of all OM details (address and ports) from the specified config.
+   */
+  public static List<OMNodeDetails> getAllOMAddresses(OzoneConfiguration conf,
+      String omServiceId, String currentOMNodeId) {
+
+    List<OMNodeDetails> omNodesList = new ArrayList<>();
+    Collection<String> omNodeIds = OmUtils.getOMNodeIds(conf, omServiceId);
+
+    String rpcAddrStr, hostAddr, httpAddr, httpsAddr;
+    int rpcPort, ratisPort;
+    if (omNodeIds.size() == 0) {
+      //Check if it is non-HA cluster
+      rpcAddrStr = OmUtils.getOmRpcAddress(conf, OZONE_OM_ADDRESS_KEY);
+      if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
+        return omNodesList;
+      }
+      hostAddr = HddsUtils.getHostName(rpcAddrStr).orElse(null);
+      rpcPort = HddsUtils.getHostPort(rpcAddrStr).orElse(0);
+      ratisPort = conf.getInt(OZONE_OM_RATIS_PORT_KEY,
+          OZONE_OM_RATIS_PORT_DEFAULT);
+      httpAddr = OmUtils.getHttpAddressForOMPeerNode(conf,
+          null, null, hostAddr);
+      httpsAddr = OmUtils.getHttpsAddressForOMPeerNode(conf,
+          null, null, hostAddr);
+
+      omNodesList.add(new OMNodeDetails.Builder()
+          .setOMNodeId(currentOMNodeId)
+          .setHostAddress(hostAddr)
+          .setRpcPort(rpcPort)
+          .setRatisPort(ratisPort)
+          .setHttpAddress(httpAddr)
+          .setHttpsAddress(httpsAddr)
+          .build());
+      return omNodesList;
+    }
+
+    for (String nodeId : omNodeIds) {
+      try {
+        OMNodeDetails omNodeDetails = OMNodeDetails.getOMNodeDetailsFromConf(
+            conf, omServiceId, nodeId);
+        omNodesList.add(omNodeDetails);
+      } catch (IOException e) {
+        String omRpcAddressStr = OMNodeDetails.getOMNodeAddressFromConf(conf,
+            omServiceId, nodeId);
+        LOG.error("OM {} is present in config file but it's address {} could " +
+            "not be resolved. Hence, OM {} is not added to list of peer nodes.",
+            nodeId, omRpcAddressStr, nodeId);
+      }
+    }
+
+    return omNodesList;
+  }
+
+  public static String getOMAddressListPrintString(List<OMNodeDetails> omList) {

Review comment:
       Missing JavaDoc

##########
File path: hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
##########
@@ -1453,7 +1559,13 @@ public void updatePeerList(List<String> omNodeIds) {
       // Check if the OM NodeID is already present in the peer list or its
       // the local NodeID.
       if (!peerNodesMap.containsKey(omNodeId) && !isCurrentNode(omNodeId)) {
-        addOMNodeToPeers(omNodeId);
+        try {
+          addOMNodeToPeers(omNodeId);
+        } catch (IOException e) {
+          LOG.error("Fatal Error: Shutting down the system as otherwise it " +

Review comment:
       Question to understand: Why OM's will diverge? And also why are we shutting down OM here?
   
   From PR description, i thought if config is not updated on existing OM's we are shutting down, but shutdown if new config does not have that node id is being done as part of this Jira.

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadataProtocol.java
##########
@@ -0,0 +1,36 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.security.KerberosInfo;
+
+/**
+ * Protocol for retrieving OM metadata information.
+ */
+@KerberosInfo(
+    serverPrincipal = OMConfigKeys.OZONE_OM_KERBEROS_PRINCIPAL_KEY)
+public interface OMMetadataProtocol extends Closeable {

Review comment:
       Just a question, why cant we add this to OMInterServiceProtocol?
   Any reason to add it as a fresh OMMetadataProtocol?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r696446556



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OMNodeDetails.java
##########
@@ -164,39 +166,80 @@ public String getOMDBCheckpointEnpointUrl(boolean isHttpPolicy) {
     return null;
   }
 
+  public String getOMPrintInfo() {
+    return getNodeId() + ":" + getHostAddress();
+  }
+
   public static OMNodeDetails getOMNodeDetailsFromConf(OzoneConfiguration conf,
       String omServiceId, String omNodeId) {
+    return getOMNodeDetailsFromConf(conf, omServiceId, omNodeId, true);
+  }
+
+  public static OMNodeDetails getOMNodeDetailsFromConf(OzoneConfiguration conf,
+      String omServiceId, String omNodeId, boolean shouldResolveAddr) {
+
     String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
         omServiceId, omNodeId);
     String rpcAddrStr = OmUtils.getOmRpcAddress(conf, rpcAddrKey);
     if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
       return null;
     }
 
+    String hostAddr;
+    InetSocketAddress omRpcAddress = null;
+    int rpcPort = 0;
+    if (shouldResolveAddr) {

Review comment:
       With this If any mistake it would be caught during last step validation of rpcAddr check in checkOMexistsInRemoteOMConfig.
   
   With out check it would be caught early.  
   
   With this kind of user errors, any approach can work. Feel free to do what you feel best here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r693659646



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OMNodeDetails.java
##########
@@ -164,39 +166,80 @@ public String getOMDBCheckpointEnpointUrl(boolean isHttpPolicy) {
     return null;
   }
 
+  public String getOMPrintInfo() {
+    return getNodeId() + ":" + getHostAddress();
+  }
+
   public static OMNodeDetails getOMNodeDetailsFromConf(OzoneConfiguration conf,
       String omServiceId, String omNodeId) {
+    return getOMNodeDetailsFromConf(conf, omServiceId, omNodeId, true);
+  }
+
+  public static OMNodeDetails getOMNodeDetailsFromConf(OzoneConfiguration conf,
+      String omServiceId, String omNodeId, boolean shouldResolveAddr) {
+
     String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
         omServiceId, omNodeId);
     String rpcAddrStr = OmUtils.getOmRpcAddress(conf, rpcAddrKey);
     if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
       return null;
     }
 
+    String hostAddr;
+    InetSocketAddress omRpcAddress = null;
+    int rpcPort = 0;
+    if (shouldResolveAddr) {

Review comment:
       Curious why do we need this shouldResolveAddr
   If it is incorrect config should nt we throw exception like before?

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java
##########
@@ -697,4 +701,51 @@ public static String normalizeKey(String keyName,
     }
     return omHosts;
   }
+
+  /**
+   * Get a list of all OM details (address and ports) from the specified config.
+   */
+  public static List<OMNodeDetails> getAllOMAddresses(OzoneConfiguration conf,
+      String omServiceId, String currentOMNodeId) {
+
+    List<OMNodeDetails> omNodesList = new ArrayList<>();
+    Collection<String> omNodeIds = OmUtils.getOMNodeIds(conf, omServiceId);
+
+    String rpcAddrKey, rpcAddrStr, hostAddr, httpAddr, httpsAddr;
+    int rpcPort, ratisPort;
+    if (omNodeIds.size() == 0) {
+      //Check if it is non-HA cluster
+      rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY);

Review comment:
       Minor NIT: Here we donot require addKeySuffixes, as we are not adding any, and directly pass OZONE_OM_ADDRESS_KEY to OmUtils.getOmRpcAddress

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadata.java
##########
@@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+
+/**
+ * Class storing the OM metadata such as the node details in memory and node
+ * details when config is reloaded from disk.
+ * Note that this class is used as a structure to transfer the OM node
+ * information through the {@link OMMetadataProtocol} and not for storing the
+ * metadata information in OzoneManager itself.
+ */
+public final class OMMetadata {

Review comment:
       Suggestion: Instead of calling this OM Metadata can we call something like OMNodeDetailsData/OMNodeInfo to be clear? 

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OMMetadataProtocolClientSideImpl.java
##########
@@ -0,0 +1,128 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.RetryPolicies;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryProxy;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.ozone.OmUtils;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocol.OMMetadata;
+import org.apache.hadoop.ozone.om.protocol.OMMetadataProtocol;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Protocol implementation for getting OM metadata information.
+ */
+public class OMMetadataProtocolClientSideImpl implements
+    OMMetadataProtocol {
+
+  /**
+   * RpcController is not used and hence is set to null.
+   */
+  private static final RpcController NULL_RPC_CONTROLLER = null;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(OMMetadataProtocolClientSideImpl.class);
+
+  private final String omNodeID;
+  private final OMMetadataProtocolPB rpcProxy;
+
+  public OMMetadataProtocolClientSideImpl(ConfigurationSource conf,
+      UserGroupInformation ugi, String omNodeId, InetSocketAddress omAddress)
+      throws IOException {
+
+    RPC.setProtocolEngine(OzoneConfiguration.of(conf),
+        OMInterServiceProtocolPB.class, ProtobufRpcEngine.class);
+
+    this.omNodeID = omNodeId;
+
+    // OM metadata is requested from a specific OM and hence there is no need
+    // of any failover provider.
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .failoverOnNetworkException(0);

Review comment:
       Do we want to retry on network errors for network flakiness or restarts happening

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadataProtocol.java
##########
@@ -0,0 +1,36 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.io.Closeable;
+import java.io.IOException;
+import org.apache.hadoop.ozone.om.OMConfigKeys;
+import org.apache.hadoop.security.KerberosInfo;
+
+/**
+ * Protocol for retrieving OM metadata information.
+ */
+@KerberosInfo(
+    serverPrincipal = OMConfigKeys.OZONE_OM_KERBEROS_PRINCIPAL_KEY)
+public interface OMMetadataProtocol extends Closeable {

Review comment:
       Suggestion/Discussion point: Same here can we rename something to OMConfigProtocol

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OMMetadataProtocolClientSideImpl.java
##########
@@ -0,0 +1,128 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.RetryPolicies;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryProxy;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.ozone.OmUtils;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocol.OMMetadata;
+import org.apache.hadoop.ozone.om.protocol.OMMetadataProtocol;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Protocol implementation for getting OM metadata information.
+ */
+public class OMMetadataProtocolClientSideImpl implements
+    OMMetadataProtocol {
+
+  /**
+   * RpcController is not used and hence is set to null.
+   */
+  private static final RpcController NULL_RPC_CONTROLLER = null;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(OMMetadataProtocolClientSideImpl.class);
+
+  private final String omNodeID;
+  private final OMMetadataProtocolPB rpcProxy;
+
+  public OMMetadataProtocolClientSideImpl(ConfigurationSource conf,
+      UserGroupInformation ugi, String omNodeId, InetSocketAddress omAddress)
+      throws IOException {
+
+    RPC.setProtocolEngine(OzoneConfiguration.of(conf),
+        OMInterServiceProtocolPB.class, ProtobufRpcEngine.class);

Review comment:
       OMInterServiceProtocolPB -> OMMetadataProtocolPB

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadata.java
##########
@@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+
+/**
+ * Class storing the OM metadata such as the node details in memory and node
+ * details when config is reloaded from disk.
+ * Note that this class is used as a structure to transfer the OM node
+ * information through the {@link OMMetadataProtocol} and not for storing the
+ * metadata information in OzoneManager itself.
+ */
+public final class OMMetadata {
+
+  // OM nodes present in OM's memory
+  private List<OMNodeDetails> omNodesInMemory = new ArrayList<>();
+  // OM nodes reloaded from new config on disk
+  private List<OMNodeDetails> omNodesInNewConf = new ArrayList<>();
+
+  private OMMetadata(List<OMNodeDetails> inMemoryNodeList,
+      List<OMNodeDetails> onDiskNodeList) {
+    if (inMemoryNodeList != null) {

Review comment:
       Minor:
   We do-not require not null check here, as Builder has initialized to a ArrayList.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r694340949



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/OMNodeDetails.java
##########
@@ -164,39 +166,80 @@ public String getOMDBCheckpointEnpointUrl(boolean isHttpPolicy) {
     return null;
   }
 
+  public String getOMPrintInfo() {
+    return getNodeId() + ":" + getHostAddress();
+  }
+
   public static OMNodeDetails getOMNodeDetailsFromConf(OzoneConfiguration conf,
       String omServiceId, String omNodeId) {
+    return getOMNodeDetailsFromConf(conf, omServiceId, omNodeId, true);
+  }
+
+  public static OMNodeDetails getOMNodeDetailsFromConf(OzoneConfiguration conf,
+      String omServiceId, String omNodeId, boolean shouldResolveAddr) {
+
     String rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY,
         omServiceId, omNodeId);
     String rpcAddrStr = OmUtils.getOmRpcAddress(conf, rpcAddrKey);
     if (rpcAddrStr == null || rpcAddrStr.isEmpty()) {
       return null;
     }
 
+    String hostAddr;
+    InetSocketAddress omRpcAddress = null;
+    int rpcPort = 0;
+    if (shouldResolveAddr) {

Review comment:
       I was trying to avoid creating a socket address when we only need to host and port string. But yes, we can throw an exception too. Though it would be run on all the OMs, the overhead will be minimal.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r696344923



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadata.java
##########
@@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+
+/**
+ * Class storing the OM metadata such as the node details in memory and node
+ * details when config is reloaded from disk.
+ * Note that this class is used as a structure to transfer the OM node
+ * information through the {@link OMMetadataProtocol} and not for storing the
+ * metadata information in OzoneManager itself.
+ */
+public final class OMMetadata {

Review comment:
       Yes good idea




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r694350553



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OMMetadataProtocolClientSideImpl.java
##########
@@ -0,0 +1,128 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.RetryPolicies;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryProxy;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.ozone.OmUtils;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocol.OMMetadata;
+import org.apache.hadoop.ozone.om.protocol.OMMetadataProtocol;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Protocol implementation for getting OM metadata information.
+ */
+public class OMMetadataProtocolClientSideImpl implements
+    OMMetadataProtocol {
+
+  /**
+   * RpcController is not used and hence is set to null.
+   */
+  private static final RpcController NULL_RPC_CONTROLLER = null;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(OMMetadataProtocolClientSideImpl.class);
+
+  private final String omNodeID;
+  private final OMMetadataProtocolPB rpcProxy;
+
+  public OMMetadataProtocolClientSideImpl(ConfigurationSource conf,
+      UserGroupInformation ugi, String omNodeId, InetSocketAddress omAddress)
+      throws IOException {
+
+    RPC.setProtocolEngine(OzoneConfiguration.of(conf),
+        OMInterServiceProtocolPB.class, ProtobufRpcEngine.class);
+
+    this.omNodeID = omNodeId;
+
+    // OM metadata is requested from a specific OM and hence there is no need
+    // of any failover provider.
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .failoverOnNetworkException(0);

Review comment:
       Done.

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OMMetadataProtocolClientSideImpl.java
##########
@@ -0,0 +1,128 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.ozone.om.protocolPB;
+
+import com.google.protobuf.RpcController;
+import com.google.protobuf.ServiceException;
+import java.io.IOException;
+import java.net.InetSocketAddress;
+import java.util.concurrent.TimeUnit;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdds.conf.ConfigurationSource;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.utils.LegacyHadoopConfigurationSource;
+import org.apache.hadoop.io.retry.RetryPolicies;
+import org.apache.hadoop.io.retry.RetryPolicy;
+import org.apache.hadoop.io.retry.RetryProxy;
+import org.apache.hadoop.ipc.ProtobufRpcEngine;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.ozone.OmUtils;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+import org.apache.hadoop.ozone.om.protocol.OMMetadata;
+import org.apache.hadoop.ozone.om.protocol.OMMetadataProtocol;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoRequest;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMMetadataInfoResponse;
+import org.apache.hadoop.ozone.protocol.proto.OzoneManagerMetadataProtocolProtos.OMNodeInfo;
+import org.apache.hadoop.security.UserGroupInformation;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Protocol implementation for getting OM metadata information.
+ */
+public class OMMetadataProtocolClientSideImpl implements
+    OMMetadataProtocol {
+
+  /**
+   * RpcController is not used and hence is set to null.
+   */
+  private static final RpcController NULL_RPC_CONTROLLER = null;
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(OMMetadataProtocolClientSideImpl.class);
+
+  private final String omNodeID;
+  private final OMMetadataProtocolPB rpcProxy;
+
+  public OMMetadataProtocolClientSideImpl(ConfigurationSource conf,
+      UserGroupInformation ugi, String omNodeId, InetSocketAddress omAddress)
+      throws IOException {
+
+    RPC.setProtocolEngine(OzoneConfiguration.of(conf),
+        OMInterServiceProtocolPB.class, ProtobufRpcEngine.class);

Review comment:
       Done.

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMMetadata.java
##########
@@ -0,0 +1,77 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+
+/**
+ * Class storing the OM metadata such as the node details in memory and node
+ * details when config is reloaded from disk.
+ * Note that this class is used as a structure to transfer the OM node
+ * information through the {@link OMMetadataProtocol} and not for storing the
+ * metadata information in OzoneManager itself.
+ */
+public final class OMMetadata {
+
+  // OM nodes present in OM's memory
+  private List<OMNodeDetails> omNodesInMemory = new ArrayList<>();
+  // OM nodes reloaded from new config on disk
+  private List<OMNodeDetails> omNodesInNewConf = new ArrayList<>();
+
+  private OMMetadata(List<OMNodeDetails> inMemoryNodeList,
+      List<OMNodeDetails> onDiskNodeList) {
+    if (inMemoryNodeList != null) {

Review comment:
       Done.

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java
##########
@@ -697,4 +701,51 @@ public static String normalizeKey(String keyName,
     }
     return omHosts;
   }
+
+  /**
+   * Get a list of all OM details (address and ports) from the specified config.
+   */
+  public static List<OMNodeDetails> getAllOMAddresses(OzoneConfiguration conf,
+      String omServiceId, String currentOMNodeId) {
+
+    List<OMNodeDetails> omNodesList = new ArrayList<>();
+    Collection<String> omNodeIds = OmUtils.getOMNodeIds(conf, omServiceId);
+
+    String rpcAddrKey, rpcAddrStr, hostAddr, httpAddr, httpsAddr;
+    int rpcPort, ratisPort;
+    if (omNodeIds.size() == 0) {
+      //Check if it is non-HA cluster
+      rpcAddrKey = ConfUtils.addKeySuffixes(OZONE_OM_ADDRESS_KEY);

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-906231351


   > Thank you @bharatviswa504 for reviewing the PR. I have addressed some of your comments and replied to the rest.
   > 
   > > One question what advantage of this validation compared to failing with exception, will we be in some inconsistent state? I just want to understand reason behind going this route. Because if any of the node is down, we cannot make use of this check i believe.
   > 
   > If the config has not been updated on an OM and the SetConfiguration request is executed, then it could lead to that OM crashing out. If we perform this check before, we can avoid a healthy OM going down because of not updating its config with the new OM info. A healthy running OM should not crash when trying to add another OM.
   > 
   > If a node is down, this check would at the very least warn the user to double check the configs before proceeding with force bootstapping.
   
   Thank You @hanishakoneru for the details. It would be good to have defensive checks instead of bringing down a cluster when a new node is added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r749704225



##########
File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestOzoneConfigurationFields.java
##########
@@ -100,7 +100,9 @@ private void addPropertiesNotInXml() {
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_INTERVAL_DELAY,
         ReconServerConfigKeys.RECON_OM_SNAPSHOT_TASK_FLUSH_PARAM,
         OMConfigKeys.OZONE_OM_RATIS_SNAPSHOT_AUTO_TRIGGER_THRESHOLD_KEY,
-        OMConfigKeys.OZONE_OM_HA_PREFIX
+        OMConfigKeys.OZONE_OM_HA_PREFIX,
+        OMConfigKeys.OZONE_OM_ADMIN_PROTOCOL_MAX_RETRIES_KEY,

Review comment:
       Because atleast internally I have seen a bug for admin commands, where using defaults it is taking longer, the users needs some settings to control retry behavior. (This is for SCM, and in SCM admin commands are part of normal client/block protocol, as in OM we have separated it out that is good.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-971928870


   Test failures looks unrelated, triggered another CI run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-958448280


   Thank You @hanishakoneru for the work.
   Overall approach LGTM, few comments/questions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r745165597



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocol/OMConfiguration.java
##########
@@ -0,0 +1,84 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.om.protocol;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.hadoop.hdds.NodeDetails;
+import org.apache.hadoop.ozone.om.helpers.OMNodeDetails;
+
+/**
+ * Class storing the OM configuration information such as the node details in
+ * memory and node details when config is reloaded from disk.
+ * Note that this class is used as a structure to transfer the OM configuration
+ * information through the {@link OMMetadataProtocol} and not for storing the
+ * configuration information in OzoneManager itself.
+ */
+public final class OMConfiguration {
+
+  // OM nodes present in OM's memory
+  private List<OMNodeDetails> omNodesInMemory = new ArrayList<>();
+  // OM nodes reloaded from new config on disk
+  private List<OMNodeDetails> omNodesInNewConf = new ArrayList<>();
+
+  private OMConfiguration(List<OMNodeDetails> inMemoryNodeList,
+      List<OMNodeDetails> onDiskNodeList) {
+    this.omNodesInMemory.addAll(inMemoryNodeList);
+    this.omNodesInNewConf.addAll(onDiskNodeList);
+  }
+
+  /**
+   * OMConfiguration Builder class.
+   */
+  public static class Builder {
+    private List<OMNodeDetails> omNodesInMemory;
+    private List<OMNodeDetails> omNodesInNewConf;
+
+    public Builder() {
+      this.omNodesInMemory = new ArrayList<>();
+      this.omNodesInNewConf = new ArrayList<>();
+    }
+
+    public Builder addToNodesInMemory(OMNodeDetails nodeDetails) {
+      this.omNodesInMemory.add(nodeDetails);
+      return this;
+    }
+
+    public Builder addToNodesInNewConf(OMNodeDetails nodeDetails) {
+      this.omNodesInNewConf.add(nodeDetails);
+      return this;
+    }
+
+    public OMConfiguration build() {
+      return new OMConfiguration(omNodesInMemory, omNodesInNewConf);
+    }
+  }
+
+  public List<String> getCurrentPeerList() {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r745164541



##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMConfigurationRequest {
+}
+
+message OMConfigurationResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;
+    required uint32 rpcPort = 3;
+    required uint32 ratisPort = 4;

Review comment:
       We do not check for either ratis port match or http address match. I can add checks for these two or remove ratisPort also from here. Let me know what you think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on a change in pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#discussion_r745166617



##########
File path: hadoop-ozone/interface-client/src/main/proto/OMMetadataProtocol.proto
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * These .proto interfaces are private and unstable.
+ * Please see http://wiki.apache.org/hadoop/Compatibility
+ * for what changes are allowed for a *unstable* .proto interface.
+ */
+
+syntax = "proto2";
+option java_package = "org.apache.hadoop.ozone.protocol.proto";
+option java_outer_classname = "OzoneManagerMetadataProtocolProtos";
+option java_generic_services = true;
+option java_generate_equals_and_hash = true;
+package hadoop.ozone;
+
+/**
+This file contains the metadata protocol for Ozone Manager(s). This involves
+getting the meta information about an individual OM.
+*/
+
+message OMConfigurationRequest {
+}
+
+message OMConfigurationResponse {
+    required bool success = 1;
+    optional string errorMsg = 2;
+    // OM nodes present in OM's memory
+    repeated OMNodeInfo nodesInMemory = 3;
+    // OM nodes reloaded from new config on disk
+    repeated OMNodeInfo nodesInNewConf = 4;
+
+}
+
+message OMNodeInfo {
+    required string nodeID = 1;
+    required string hostAddress = 2;
+    required uint32 rpcPort = 3;
+    required uint32 ratisPort = 4;

Review comment:
       We currently do not check for ratisPort and httpAddr match while doing bootstrap check. I can add them if you think it is necessary. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-967285312


   > As this verify configuration is InterOM only right, not external client interaction. And we can add to OmClientProtocol that API right, do we need a new protocol for admin operations. And in OM server we can check whether admin has issued the command or not correct, Let me know if i am missing something here.
   
   It would be better to keep the admin protocols separate from client protocols. RemoveOM operation is a critical operation and I don't think it should be exposed to clients at all. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2491: HDDS-5534. Verify config is updated on all OMs before proceeding with Bootstrap

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2491:
URL: https://github.com/apache/ozone/pull/2491#issuecomment-973101468


   @bharatviswa504, thanks for reviewing multiple times and approving it. I will merge this shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org