You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/04/09 13:51:45 UTC

[GitHub] [ozone] bshashikant opened a new pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

bshashikant opened a new pull request #2141:
URL: https://github.com/apache/ozone/pull/2141


   ## What changes were proposed in this pull request?
   
   Handling of suggested leader for SCM client requests
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-5051
   
   ## How was this patch tested?
   
   Added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618322583



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -196,7 +232,7 @@ public void performFailoverToNextProxy() {
    * Update the proxy index to the next proxy in the list.
    * @return the new proxy index
    */
-  private synchronized int incrementProxyIndex() {
+  private int incrementProxyIndex() {

Review comment:
       synchronized??




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618345226



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
##########
@@ -198,12 +201,40 @@ public static OzoneConfiguration removeSelfId(
     return getSCMNodeIds(configuration, scmServiceId);
   }
 
+  private static Throwable unwrapException(Exception e) {
+    IOException ioException = null;
+    Throwable cause = e.getCause();

Review comment:
       This is specific to handling serviceExceptions




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r620040090



##########
File path: hadoop-ozone/dist/src/main/compose/ozonesecure-ha/docker-config
##########
@@ -47,6 +47,7 @@ OZONE-SITE.XML_ozone.scm.client.address=scm
 OZONE-SITE.XML_hdds.block.token.enabled=true
 OZONE-SITE.XML_hdds.grpc.tls.enabled=true
 OZONE-SITE.XML_ozone.replication=3
+OZONE-SITE.XML_hdds.scmclient.max.retry.timeout=30s

Review comment:
       Addressed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r613180710



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -181,6 +182,41 @@ public void performFailover(SCMSecurityProtocolPB currentProxy) {
     }
   }
 
+  public void performFailoverToAssignedLeader(String newLeader, Exception e) {

Review comment:
       I tried doing this , but its not trivial and will require some refactoring. This needs to be a separate patch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r620010828



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -99,7 +100,7 @@ public SCMSecurityProtocolFailoverProxyProvider(ConfigurationSource conf,
     this.retryInterval = scmClientConfig.getRetryInterval();
   }
 
-  protected void loadConfigs() {
+  protected synchronized void loadConfigs() {

Review comment:
       Do we need sychrnoize here?
   As loadConfigs is called from constructor

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -99,7 +100,7 @@ public SCMSecurityProtocolFailoverProxyProvider(ConfigurationSource conf,
     this.retryInterval = scmClientConfig.getRetryInterval();
   }
 
-  protected void loadConfigs() {
+  protected synchronized void loadConfigs() {

Review comment:
       Do we need synchrnoize here?
   As loadConfigs is called from constructor

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -99,7 +100,7 @@ public SCMSecurityProtocolFailoverProxyProvider(ConfigurationSource conf,
     this.retryInterval = scmClientConfig.getRetryInterval();
   }
 
-  protected void loadConfigs() {
+  protected synchronized void loadConfigs() {

Review comment:
       Same for all failOverProxy classes

##########
File path: hadoop-ozone/dist/src/main/compose/ozonesecure-ha/docker-config
##########
@@ -47,6 +47,7 @@ OZONE-SITE.XML_ozone.scm.client.address=scm
 OZONE-SITE.XML_hdds.block.token.enabled=true
 OZONE-SITE.XML_hdds.grpc.tls.enabled=true
 OZONE-SITE.XML_ozone.replication=3
+OZONE-SITE.XML_hdds.scmclient.max.retry.timeout=30s

Review comment:
       This will be a problem for the secure tests, as getScmInfo will retry for 30s and fail. We want to try little longer for that as we see some failures in test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r613177684



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -250,7 +268,7 @@ public RetryPolicy getSCMBlockLocationRetryPolicy(String newLeader) {
       public RetryAction shouldRetry(Exception e, int retry,
                                      int failover, boolean b) {
         if (!SCMHAUtils.isRetriableWithNoFailoverException(e)) {
-          performFailoverToAssignedLeader(newLeader);
+          performFailoverToAssignedLeader(newLeader, e);

Review comment:
       > In performFailoverToAssignedLeader when newLeader is null, we still failover to nextProxy. Eventough the exception type here is like RETRIABLE_WITH_NO_FAILOVER_EXCEPTION_LIST
   
   I think the failover will only happen if exception is not of type SCMHAUtils.isRetriableWithNoFailoverException.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618322797



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -260,7 +293,7 @@ public void close() throws IOException {
     }
   }
 
-  public synchronized String getCurrentProxySCMNodeId() {
+  public String getCurrentProxySCMNodeId() {

Review comment:
       synchronized??
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r617367866



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -86,7 +98,16 @@ public void setRpcTimeOut(long timeOut) {
   }
 
   public int getRetryCount() {
-    return retryCount;
+    long duration = getMaxRetryTimeout();

Review comment:
       [Discussion  Question] Do we need this config still? As we get this value anyway from maxRetryTimeOut. (Only case we get now is when retryCount obtained from maxRetryTimeOut is less.
   
   I have a mixed view, in few cases, it will help, if the client wants to retry more time than the value we obtain. Not completely sure do we want it or avoid it.

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/HddsServerUtil.java
##########
@@ -461,26 +459,8 @@ public static SCMSecurityProtocolClientSideTranslatorPB getScmSecurityClient(
   public static SCMSecurityProtocolClientSideTranslatorPB

Review comment:
       We can totally remove this method now.
   It is same as getScmSecurityClient

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/RatisUtil.java
##########
@@ -174,4 +178,18 @@ private static void setRaftSnapshotProperties(
         conf.getRatisSnapshotThreshold());
   }
 
+  public static void checkRatisException(IOException e, String port,
+      String scmId) throws ServiceException {
+    if (SCMHAUtils.isNonRetriableException(e)) {

Review comment:
       Here on server, we check this for NonRetriableException. But I don't see where we throw it.

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
##########
@@ -229,6 +273,8 @@ public static boolean isRetriableWithNoFailoverException(Exception e) {
       } else {
         return RetryPolicy.RetryAction.FAIL;
       }
+    } else if (SCMHAUtils.isNonRetriableException(e)) {
+      return RetryPolicy.RetryAction.FAIL;
     } else {
       if (failovers < maxRetryCount) {
         return new RetryPolicy.RetryAction(

Review comment:
       Here do we need to do manually failover, as performFailOver is a dummy method.

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
##########
@@ -198,12 +201,27 @@ public static OzoneConfiguration removeSelfId(
     return getSCMNodeIds(configuration, scmServiceId);
   }
 
+  private static Throwable unwrapException(Exception e) {
+    IOException ioException = null;
+    Throwable cause = e.getCause();
+    if (cause instanceof RemoteException) {
+      ioException = ((RemoteException) cause).unwrapRemoteException();
+    }
+    return ioException == null ? e : ioException;
+  }
+
+  public static boolean isNonRetriableException(Exception e) {
+    Throwable t = unwrapException(e);
+    return NonRetriableException.class.isInstance(t);
+  }
+
   // This will return the underlying exception after unwrapping
   // the exception to see if it matches with expected exception
   // list , returns true otherwise will return false.
   public static boolean isRetriableWithNoFailoverException(Exception e) {
-    Throwable t = e;
-    while (t != null && t.getCause() != null) {
+
+    Throwable t = unwrapException(e);
+    while (t != null) {
       for (Class<? extends Exception> clazz :
           getRetriableWithNoFailoverExceptionList()) {

Review comment:
       Here we can check RetriableWithNoFailoverException right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618343729



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1098,6 +1098,16 @@ public String getClientRpcPort() {
     return addr == null ? "0" : Integer.toString(addr.getPort());
   }
 
+  public String getBlockProtocolRpcPort() {
+    InetSocketAddress addr = getBlockProtocolServer().getBlockRpcAddress();
+    return addr == null ? "0" : Integer.toString(addr.getPort());

Review comment:
       Following existing norm here followed for client port :
    @Override
     public String getClientRpcPort() {
       InetSocketAddress addr = getClientRpcAddress();
       return addr == null ? "0" : Integer.toString(addr.getPort());
     }
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r613000389



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/ServerNotLeaderException.java
##########
@@ -0,0 +1,99 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.ratis;
+
+import org.apache.hadoop.hdds.HddsUtils;
+import org.apache.ratis.protocol.RaftPeerId;
+import org.apache.ratis.protocol.exceptions.NotLeaderException;
+
+import java.io.IOException;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ServerNotLeaderException extends IOException {
+  private final String currentPeerId;
+  private final String leader;
+  private static final Pattern CURRENT_PEER_ID_PATTERN =
+      Pattern.compile("Server:(.*) is not the leader[.]+.*", Pattern.DOTALL);
+  private static final Pattern SUGGESTED_LEADER_PATTERN =
+      Pattern.compile(".*Suggested leader is Server:([^.]*).*", Pattern.DOTALL);
+
+  public ServerNotLeaderException(RaftPeerId currentPeerId) {
+    super("Server:" + currentPeerId + " is not the leader. Could not " +
+        "determine the leader node.");
+    this.currentPeerId = currentPeerId.toString();
+    this.leader = null;
+  }
+
+  public ServerNotLeaderException(RaftPeerId currentPeerId,
+      String suggestedLeader) {
+    super("Server:" + currentPeerId + " is not the leader. Suggested leader is"
+        + " Server:" + suggestedLeader + ".");
+    this.currentPeerId = currentPeerId.toString();
+    this.leader = suggestedLeader;
+  }
+
+  public ServerNotLeaderException(String message) {

Review comment:
       Minor NIT: Unused constructor.

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/ServerNotLeaderException.java
##########
@@ -0,0 +1,99 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.ratis;
+
+import org.apache.hadoop.hdds.HddsUtils;
+import org.apache.ratis.protocol.RaftPeerId;
+import org.apache.ratis.protocol.exceptions.NotLeaderException;
+
+import java.io.IOException;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+

Review comment:
       Minor: Add JavaDoc.

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMContainerLocationFailoverProxyProvider.java
##########
@@ -154,13 +162,26 @@ public synchronized void performFailover(
     LOG.debug("Failing over to next proxy. {}", getCurrentProxySCMNodeId());
   }
 
-  public synchronized void performFailoverToAssignedLeader(String newLeader) {
+  public void performFailoverToAssignedLeader(String newLeader, Exception e) {
+    ServerNotLeaderException snle =
+        (ServerNotLeaderException) SCMHAUtils.getServerNotLeaderException(e);
+    if (snle != null && snle.getSuggestedLeader() != null) {
+      newLeader = scmProxyInfoMap.values().stream().filter(
+          proxyInfo -> NetUtils.getHostPortString(proxyInfo.getAddress())
+              .equals(snle.getSuggestedLeader())).findFirst().get().getNodeId();
+      LOG.debug("Performing failover to suggested leader {}, nodeId {}",
+          snle.getSuggestedLeader(), newLeader);
+    }
     if (newLeader == null) {
-      // If newLeader is not assigned, fail over to next proxy.
-      nextProxyIndex();
-    } else if (!assignLeaderToNode(newLeader)) {
-      // If failed to fail over to newLeader, fail over to next proxy.
+      // If newLeader is not assigned, it will fail over to next proxy.
       nextProxyIndex();
+      LOG.debug("Performing failover to next proxy node {}",
+          currentProxySCMNodeId);
+    } else {
+      if (!assignLeaderToNode(newLeader)) {
+        LOG.debug("Failing over SCM proxy to nodeId: {}", newLeader);
+        nextProxyIndex();

Review comment:
       Same as above

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -250,7 +268,7 @@ public RetryPolicy getSCMBlockLocationRetryPolicy(String newLeader) {
       public RetryAction shouldRetry(Exception e, int retry,
                                      int failover, boolean b) {
         if (!SCMHAUtils.isRetriableWithNoFailoverException(e)) {
-          performFailoverToAssignedLeader(newLeader);
+          performFailoverToAssignedLeader(newLeader, e);

Review comment:
       In performFailoverToAssignedLeader when newLeader is null, we still failover to nextProxy. Eventough the exception type here is like RETRIABLE_WITH_NO_FAILOVER_EXCEPTION_LIST

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -181,6 +182,41 @@ public void performFailover(SCMSecurityProtocolPB currentProxy) {
     }
   }
 
+  public void performFailoverToAssignedLeader(String newLeader, Exception e) {

Review comment:
       Can we move this to a utility class, and make it a utillity method by passing all required parameters.
   So that we can avoid code de-duplication, and easier to fix the logic further.

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/ScmBlockLocationProtocolClientSideTranslatorPB.java
##########
@@ -119,7 +119,8 @@ private SCMBlockLocationResponse submitRequest(
       if (response.getStatus() ==
           ScmBlockLocationProtocolProtos.Status.SCM_NOT_LEADER) {
         failoverProxyProvider
-            .performFailoverToAssignedLeader(response.getLeaderSCMNodeId());

Review comment:
       Do we need this at all, as server is only sending exception for NotLeader

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
##########
@@ -198,12 +201,27 @@ public static OzoneConfiguration removeSelfId(
     return getSCMNodeIds(configuration, scmServiceId);
   }
 
+  private static Throwable unwrapException(Exception e) {
+    IOException ioException = null;
+    Throwable cause = e.getCause();
+    if (cause instanceof RemoteException) {
+      ioException = ((RemoteException) cause).unwrapRemoteException();
+    }
+    return ioException == null ? e : ioException;
+  }
+
+  public static boolean isNonRetriableException(Exception e) {
+    Throwable t = unwrapException(e);

Review comment:
       Looks like the server returns NonRetriableException which wraps StateMachineException. Should we check here for NonRetriableException

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -145,13 +152,24 @@ public synchronized ProxyInfo getProxy() {
   @Override
   public void performFailover(ScmBlockLocationProtocolPB newLeader) {
     // Should do nothing here.
-    LOG.debug("Failing over to next proxy. {}", getCurrentProxyOMNodeId());
+    LOG.debug("Failing over to next proxy. {}", getCurrentProxySCMNodeId());
   }
 
-  public void performFailoverToAssignedLeader(String newLeader) {
+  public void performFailoverToAssignedLeader(String newLeader, Exception e) {
+    ServerNotLeaderException snle =
+        (ServerNotLeaderException) SCMHAUtils.getServerNotLeaderException(e);
+    if (snle != null && snle.getSuggestedLeader() != null) {
+      newLeader = scmProxyInfoMap.values().stream().filter(
+          proxyInfo -> NetUtils.getHostPortString(proxyInfo.getAddress())
+              .equals(snle.getSuggestedLeader())).findFirst().get().getNodeId();
+      LOG.debug("Performing failover to suggested leader {}, nodeId {}",
+          snle.getSuggestedLeader(), newLeader);
+    }
     if (newLeader == null) {
       // If newLeader is not assigned, it will fail over to next proxy.
       nextProxyIndex();
+      LOG.debug("Performing failover to next proxy node {}",
+          currentProxySCMNodeId);
     } else {
       if (!assignLeaderToNode(newLeader)) {
         LOG.debug("Failing over SCM proxy to nodeId: {}", newLeader);

Review comment:
       Do we need to call nextProxyIndex() here?
   As we got newLeader, we should just failOver to newLeader 

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -250,7 +268,7 @@ public RetryPolicy getSCMBlockLocationRetryPolicy(String newLeader) {
       public RetryAction shouldRetry(Exception e, int retry,
                                      int failover, boolean b) {
         if (!SCMHAUtils.isRetriableWithNoFailoverException(e)) {
-          performFailoverToAssignedLeader(newLeader);
+          performFailoverToAssignedLeader(newLeader, e);

Review comment:
       And also can we remove lastAttemptedLeader, as it is not being used anywhere. (Though not related, it might be good as it makes code cleaner if no use)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618322201



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -173,14 +174,49 @@ private SCMSecurityProtocolPB createSCMProxy(InetSocketAddress scmAddress)
 
 
   @Override
-  public void performFailover(SCMSecurityProtocolPB currentProxy) {
+  public synchronized void performFailover(SCMSecurityProtocolPB currentProxy) {
     if (LOG.isDebugEnabled()) {
       int currentIndex = getCurrentProxyIndex();
       LOG.debug("Failing over SCM Security proxy to index: {}, nodeId: {}",
           currentIndex, scmNodeIds.get(currentIndex));
     }
   }
 
+  public void performFailoverToAssignedLeader(String newLeader, Exception e) {

Review comment:
       Synchronized??




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618322952



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -260,7 +293,7 @@ public void close() throws IOException {
     }
   }
 
-  public synchronized String getCurrentProxySCMNodeId() {
+  public String getCurrentProxySCMNodeId() {

Review comment:
       Check same for all proxyfailover classes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#issuecomment-824022432


   https://issues.apache.org/jira/browse/HDDS-5130 is open to track the test coverage issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r613175323



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/ServerNotLeaderException.java
##########
@@ -0,0 +1,99 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.ratis;
+
+import org.apache.hadoop.hdds.HddsUtils;
+import org.apache.ratis.protocol.RaftPeerId;
+import org.apache.ratis.protocol.exceptions.NotLeaderException;
+
+import java.io.IOException;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ServerNotLeaderException extends IOException {
+  private final String currentPeerId;
+  private final String leader;
+  private static final Pattern CURRENT_PEER_ID_PATTERN =
+      Pattern.compile("Server:(.*) is not the leader[.]+.*", Pattern.DOTALL);
+  private static final Pattern SUGGESTED_LEADER_PATTERN =
+      Pattern.compile(".*Suggested leader is Server:([^.]*).*", Pattern.DOTALL);
+
+  public ServerNotLeaderException(RaftPeerId currentPeerId) {
+    super("Server:" + currentPeerId + " is not the leader. Could not " +
+        "determine the leader node.");
+    this.currentPeerId = currentPeerId.toString();
+    this.leader = null;
+  }
+
+  public ServerNotLeaderException(RaftPeerId currentPeerId,
+      String suggestedLeader) {
+    super("Server:" + currentPeerId + " is not the leader. Suggested leader is"
+        + " Server:" + suggestedLeader + ".");
+    this.currentPeerId = currentPeerId.toString();
+    this.leader = suggestedLeader;
+  }
+
+  public ServerNotLeaderException(String message) {

Review comment:
       This is not unused but will be called by RpcProxy while decoding the exception msg.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] symious commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
symious commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r620121614



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -37,6 +37,8 @@
   public static final String SCM_CLIENT_RPC_TIME_OUT = "rpc.timeout";
   public static final String SCM_CLIENT_FAILOVER_MAX_RETRY =
       "failover.max.retry";
+  public static final String SCM_CLIENT_MAX_RETRY_TIMEOUT=

Review comment:
       Need a space here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618322387



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -173,14 +174,49 @@ private SCMSecurityProtocolPB createSCMProxy(InetSocketAddress scmAddress)
 
 
   @Override
-  public void performFailover(SCMSecurityProtocolPB currentProxy) {
+  public synchronized void performFailover(SCMSecurityProtocolPB currentProxy) {
     if (LOG.isDebugEnabled()) {
       int currentIndex = getCurrentProxyIndex();
       LOG.debug("Failing over SCM Security proxy to index: {}, nodeId: {}",
           currentIndex, scmNodeIds.get(currentIndex));
     }
   }
 
+  public void performFailoverToAssignedLeader(String newLeader, Exception e) {
+    ServerNotLeaderException snle =
+        (ServerNotLeaderException) SCMHAUtils.getServerNotLeaderException(e);
+    if (snle != null && snle.getSuggestedLeader() != null) {
+      newLeader = scmProxyInfoMap.values().stream().filter(
+          proxyInfo -> NetUtils.getHostPortString(proxyInfo.getAddress())
+              .equals(snle.getSuggestedLeader())).findFirst().get().getNodeId();
+      LOG.debug("Performing failover to suggested leader {}, nodeId {}",
+          snle.getSuggestedLeader(), newLeader);
+    }
+    if (newLeader == null) {
+      // If newLeader is not assigned, it will fail over to next proxy.
+      performFailoverToNextProxy();
+      LOG.debug("Performing failover to next proxy node {}",
+          currentProxySCMNodeId);
+    } else {
+      if (!assignLeaderToNode(newLeader)) {
+        LOG.debug("Failing over SCM proxy to nodeId: {}", newLeader);
+        performFailoverToNextProxy();
+      }
+    }
+  }
+
+  private boolean assignLeaderToNode(String newLeaderNodeId) {

Review comment:
       Should we synchronize this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r613175323



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/ServerNotLeaderException.java
##########
@@ -0,0 +1,99 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.hdds.ratis;
+
+import org.apache.hadoop.hdds.HddsUtils;
+import org.apache.ratis.protocol.RaftPeerId;
+import org.apache.ratis.protocol.exceptions.NotLeaderException;
+
+import java.io.IOException;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+public class ServerNotLeaderException extends IOException {
+  private final String currentPeerId;
+  private final String leader;
+  private static final Pattern CURRENT_PEER_ID_PATTERN =
+      Pattern.compile("Server:(.*) is not the leader[.]+.*", Pattern.DOTALL);
+  private static final Pattern SUGGESTED_LEADER_PATTERN =
+      Pattern.compile(".*Suggested leader is Server:([^.]*).*", Pattern.DOTALL);
+
+  public ServerNotLeaderException(RaftPeerId currentPeerId) {
+    super("Server:" + currentPeerId + " is not the leader. Could not " +
+        "determine the leader node.");
+    this.currentPeerId = currentPeerId.toString();
+    this.leader = null;
+  }
+
+  public ServerNotLeaderException(RaftPeerId currentPeerId,
+      String suggestedLeader) {
+    super("Server:" + currentPeerId + " is not the leader. Suggested leader is"
+        + " Server:" + suggestedLeader + ".");
+    this.currentPeerId = currentPeerId.toString();
+    this.leader = suggestedLeader;
+  }
+
+  public ServerNotLeaderException(String message) {

Review comment:
       This is not unused but will be called by RpcProxy.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618297230



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
##########
@@ -198,12 +201,40 @@ public static OzoneConfiguration removeSelfId(
     return getSCMNodeIds(configuration, scmServiceId);
   }
 
+  private static Throwable unwrapException(Exception e) {
+    IOException ioException = null;
+    Throwable cause = e.getCause();

Review comment:
       should we start here from e, not e.getCause()?

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMHAUtils.java
##########
@@ -198,12 +201,40 @@ public static OzoneConfiguration removeSelfId(
     return getSCMNodeIds(configuration, scmServiceId);
   }
 
+  private static Throwable unwrapException(Exception e) {
+    IOException ioException = null;
+    Throwable cause = e.getCause();
+    if (cause instanceof RemoteException) {
+      ioException = ((RemoteException) cause).unwrapRemoteException();
+    }
+    return ioException == null ? e : ioException;
+  }
+
+  /**
+   * Checks if the underlying exception if of type StateMachine. Used by scm
+   * clients.
+   */
+  public static boolean isNonRetriableException(Exception e) {
+    Throwable t =
+        getExceptionForClass(e, StateMachineException.class);
+    return t == null ? false : true;
+  }
+
+  /**
+   * Checks if the underlying exception if of type non retriable. Used by scm
+   * clients.
+   */
+  public static boolean checkNonRetriableException(Exception e) {
+    Throwable t = unwrapException(e);
+    return NonRetriableException.class.isInstance(t);
+  }
+
   // This will return the underlying exception after unwrapping
   // the exception to see if it matches with expected exception
   // list , returns true otherwise will return false.
   public static boolean isRetriableWithNoFailoverException(Exception e) {
-    Throwable t = e;
-    while (t != null && t.getCause() != null) {
+    Throwable t = unwrapException(e);

Review comment:
       Should be hereThrowable t = e, instead of unwrapException?

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMContainerLocationFailoverProxyProvider.java
##########
@@ -154,13 +162,26 @@ public synchronized void performFailover(
     LOG.debug("Failing over to next proxy. {}", getCurrentProxySCMNodeId());
   }
 
-  public synchronized void performFailoverToAssignedLeader(String newLeader) {
+  public void performFailoverToAssignedLeader(String newLeader, Exception e) {

Review comment:
       I think we should have synchronized here.
   Same for all classses

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -180,7 +200,7 @@ private synchronized long getRetryInterval() {
     return retryInterval;
   }
 
-  private synchronized int nextProxyIndex() {
+  private int nextProxyIndex() {

Review comment:
       Why synchronized is removed here?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1098,6 +1098,16 @@ public String getClientRpcPort() {
     return addr == null ? "0" : Integer.toString(addr.getPort());
   }
 
+  public String getBlockProtocolRpcPort() {
+    InetSocketAddress addr = getBlockProtocolServer().getBlockRpcAddress();
+    return addr == null ? "0" : Integer.toString(addr.getPort());
+  }
+
+  public String getSecurityProtocolRpcPort() {
+    InetSocketAddress addr = getSecurityProtocolServer().getRpcAddress();
+    return addr == null ? "0" : Integer.toString(addr.getPort());

Review comment:
       Same here?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1098,6 +1098,16 @@ public String getClientRpcPort() {
     return addr == null ? "0" : Integer.toString(addr.getPort());
   }
 
+  public String getBlockProtocolRpcPort() {
+    InetSocketAddress addr = getBlockProtocolServer().getBlockRpcAddress();
+    return addr == null ? "0" : Integer.toString(addr.getPort());

Review comment:
       Do we need these nulll check?

##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMContainerLocationFailoverProxyProvider.java
##########
@@ -131,6 +132,13 @@ public synchronized String getCurrentProxySCMNodeId() {
     return currentProxySCMNodeId;
   }
 
+  @VisibleForTesting
+  public void changeCurrentProxy(String nodeId) {

Review comment:
       Synchronized needed here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r618343874



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
##########
@@ -1098,6 +1098,16 @@ public String getClientRpcPort() {
     return addr == null ? "0" : Integer.toString(addr.getPort());
   }
 
+  public String getBlockProtocolRpcPort() {
+    InetSocketAddress addr = getBlockProtocolServer().getBlockRpcAddress();
+    return addr == null ? "0" : Integer.toString(addr.getPort());
+  }
+
+  public String getSecurityProtocolRpcPort() {
+    InetSocketAddress addr = getSecurityProtocolServer().getRpcAddress();
+    return addr == null ? "0" : Integer.toString(addr.getPort());

Review comment:
       Same as above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r617425930



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMClientConfig.java
##########
@@ -86,7 +98,16 @@ public void setRpcTimeOut(long timeOut) {
   }
 
   public int getRetryCount() {
-    return retryCount;
+    long duration = getMaxRetryTimeout();

Review comment:
       I feel, let's keep it for now. We can remove it later as deemed fit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r613178939



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMBlockLocationFailoverProxyProvider.java
##########
@@ -145,13 +152,24 @@ public synchronized ProxyInfo getProxy() {
   @Override
   public void performFailover(ScmBlockLocationProtocolPB newLeader) {
     // Should do nothing here.
-    LOG.debug("Failing over to next proxy. {}", getCurrentProxyOMNodeId());
+    LOG.debug("Failing over to next proxy. {}", getCurrentProxySCMNodeId());
   }
 
-  public void performFailoverToAssignedLeader(String newLeader) {
+  public void performFailoverToAssignedLeader(String newLeader, Exception e) {
+    ServerNotLeaderException snle =
+        (ServerNotLeaderException) SCMHAUtils.getServerNotLeaderException(e);
+    if (snle != null && snle.getSuggestedLeader() != null) {
+      newLeader = scmProxyInfoMap.values().stream().filter(
+          proxyInfo -> NetUtils.getHostPortString(proxyInfo.getAddress())
+              .equals(snle.getSuggestedLeader())).findFirst().get().getNodeId();
+      LOG.debug("Performing failover to suggested leader {}, nodeId {}",
+          snle.getSuggestedLeader(), newLeader);
+    }
     if (newLeader == null) {
       // If newLeader is not assigned, it will fail over to next proxy.
       nextProxyIndex();
+      LOG.debug("Performing failover to next proxy node {}",
+          currentProxySCMNodeId);
     } else {
       if (!assignLeaderToNode(newLeader)) {
         LOG.debug("Failing over SCM proxy to nodeId: {}", newLeader);

Review comment:
       assignLeaderToNode will return true if the new leader is found in the proxy list. If its not there, default is to fall back to next proxy which IMO is the right behaviour.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] mukul1987 merged pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
mukul1987 merged pull request #2141:
URL: https://github.com/apache/ozone/pull/2141


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on a change in pull request #2141: HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #2141:
URL: https://github.com/apache/ozone/pull/2141#discussion_r620020992



##########
File path: hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/proxy/SCMSecurityProtocolFailoverProxyProvider.java
##########
@@ -99,7 +100,7 @@ public SCMSecurityProtocolFailoverProxyProvider(ConfigurationSource conf,
     this.retryInterval = scmClientConfig.getRetryInterval();
   }
 
-  protected void loadConfigs() {
+  protected synchronized void loadConfigs() {

Review comment:
       It was added to address findbug issues.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org