You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/10/26 20:03:16 UTC

[GitHub] [ozone] hanishakoneru opened a new pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

hanishakoneru opened a new pull request #2772:
URL: https://github.com/apache/ozone/pull/2772


   ## What changes were proposed in this pull request?
   
   When OMFailoverProxy encounters an RpcException, we should check if a failover or retry is required or not.
   
   For example, if the response length exceeds ipc.maximum.response.length on one OM, it will exceed the limit on all OMs. So we do not need to do a retry or failover.
   
   If there is an RpcNoSuchMethodException, RpcNoSuchProtocolException or VersionMismatch Exception etc. on only 1 OM and not of others, it could lead to state divergence between the OMs. So it would be better to fail the request than to retry on other nodes.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5872
   
   ## How was this patch tested?
   
   Tested on a pseudo cluster 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#discussion_r746443205



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -451,8 +453,9 @@ public synchronized long getWaitTime() {
     return waitBetweenRetries;
   }
 
-  public synchronized boolean shouldFailover(Exception ex) {
-    if (OmUtils.isAccessControlException(ex)) {
+  private synchronized boolean shouldFailover(Exception ex) {
+    Throwable unwrappedException = HddsUtils.getUnwrappedException(ex);
+    if (unwrappedException instanceof AccessControlException) {

Review comment:
       `TestDelegationToken` is failing because client retries on `SecretManager.InvalidToken`:
   
   ```
   2021-11-10 11:01:05,004 [Listener at 0.0.0.0/9862] INFO  retry.RetryInvocationHandler (RetryInvocationHandler.java:log(411)) - com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (OzoneToken owner=om/10.19.13.197@EXAMPLE.COM, renewer=om, realUser=, issueDate=2021-11-10T10:00:06.408Z, maxDate=2021-11-17T10:00:06.408Z, sequenceNumber=1, masterKeyId=1, strToSign=null, signature=null, awsAccessKeyId=null, omServiceId=omServiceIdDefault) can't be found in cache, while invoking $Proxy38.submitRequest over nodeId=null,nodeAddress=0.0.0.0:9862 after 7 failover attempts. Trying to failover after sleeping for 16000ms.
   ```
   
   It seems SCM's and OM's version of `isAccessControlException` was different in this respect:
   
   https://github.com/apache/ozone/blob/96e2d7f94075474e6fe8ebf0ee31b2ee3855fbfe/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java#L633-L634
   
   ```suggestion
       if (unwrappedException instanceof AccessControlException ||
           unwrappedException instanceof SecretManager.InvalidToken) {
   ```

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -54,6 +55,7 @@
 import org.apache.hadoop.ozone.om.exceptions.OMLeaderNotReadyException;
 import org.apache.hadoop.ozone.om.exceptions.OMNotLeaderException;
 import org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB;
+import org.apache.hadoop.security.AccessControlException;
 import org.apache.hadoop.security.UserGroupInformation;

Review comment:
       (Needed for other code suggestion)
   
   ```suggestion
   import org.apache.hadoop.security.AccessControlException;
   import org.apache.hadoop.security.UserGroupInformation;
   import org.apache.hadoop.security.token.SecretManager;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#discussion_r744604890



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/util/OzoneUtils.java
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+
+package org.apache.hadoop.util;
+
+import com.google.protobuf.ServiceException;
+import org.apache.hadoop.ipc.RPC;
+import org.apache.hadoop.ipc.RemoteException;
+import org.apache.hadoop.ipc.RpcException;
+import org.apache.hadoop.ipc.RpcNoSuchMethodException;
+import org.apache.hadoop.ipc.RpcNoSuchProtocolException;
+import org.apache.hadoop.security.AccessControlException;
+import org.apache.hadoop.security.token.SecretManager;
+
+/**
+ * Common Util class for SCM, OM and other components.
+ */
+public final class OzoneUtils {

Review comment:
       Nit: can we add this new method to the existing `HddsUtils` class instead of creating a new one?
   
   https://github.com/apache/ozone/blob/c088cd1d96895d8f6a13fa3903909ea5b85ceb04/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsUtils.java#L78
   
   Note, we also already have `OzoneUtils`, but it's in `ozone-common`:
   
   https://github.com/apache/ozone/blob/c088cd1d96895d8f6a13fa3903909ea5b85ceb04/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/web/utils/OzoneUtils.java#L44




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#issuecomment-967292059


   Thanks @adoroszlai for reviewing and approving this PR. I will merge this shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#discussion_r742174939



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/OmUtils.java
##########
@@ -621,23 +622,27 @@ public static String getOzoneManagerServiceId(OzoneConfiguration conf)
 
   /**
    * Unwrap exception to check if it is some kind of access control problem
-   * ({@link AccessControlException} or {@link SecretManager.InvalidToken}).
+   * ({@link AccessControlException} or {@link SecretManager.InvalidToken})
+   * or a RpcException.
    */
-  public static boolean isAccessControlException(Exception ex) {
+  public static Throwable getUnwrappedException(Exception ex) {
     if (ex instanceof ServiceException) {
       Throwable t = ex.getCause();
       if (t instanceof RemoteException) {
         t = ((RemoteException) t).unwrapRemoteException();
       }
       while (t != null) {
+        if (t instanceof RpcException) {
+          return t;
+        }
         if (t instanceof AccessControlException ||
             t instanceof SecretManager.InvalidToken) {
-          return true;
+          return t;

Review comment:
       Nit: can we merge these two identical `if` blocks?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#discussion_r740330227



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -464,6 +469,19 @@ public synchronized boolean shouldFailover(Exception ex) {
           return false;
         }
       }
+    } else if (unwrappedException instanceof RpcException) {
+      // Do not failover for following exceptions
+      if (unwrappedException instanceof RpcNoSuchMethodException ||
+          unwrappedException instanceof RpcNoSuchProtocolException ||
+          unwrappedException instanceof RPC.VersionMismatch) {
+        return false;
+      }
+      if (unwrappedException.getMessage().contains(
+          "RPC response exceeds maximum data length") ||

Review comment:
       Similar change we might need in SCM also, for OM-SCM protocol. Can we create this as a common utility method, so that it can be used in SCM also.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#issuecomment-963449097


   Thanks @adoroszlai for reviewing. I have moved the utility functions to HddsUtils. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#issuecomment-961276115


   Thank you @bharatviswa504 and @adoroszlai for the reviews. Addressed your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#discussion_r745435622



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsUtils.java
##########
@@ -622,4 +631,48 @@ public static long getShutDownTimeOut(ConfigurationSource conf) {
   public static int roundupMb(long bytes) {
     return (int)Math.ceil((double) bytes/(double) ONE_MB);
   }
+
+  /**
+   * Unwrap exception to check if it is some kind of access control problem
+   * ({@link AccessControlException} or {@link SecretManager.InvalidToken})
+   * or a RpcException.
+   */
+  public static Throwable getUnwrappedException(Exception ex) {
+    if (ex instanceof ServiceException) {
+      Throwable t = ex.getCause();
+      if (t instanceof RemoteException) {
+        t = ((RemoteException) t).unwrapRemoteException();
+      }
+      while (t != null) {
+        if (t instanceof RpcException ||
+            t instanceof AccessControlException ||
+            t instanceof SecretManager.InvalidToken) {
+          return t;
+        }
+        t = t.getCause();
+      }
+    }
+    Map<Integer, Integer> map = new HashMap();

Review comment:
       Findbugs complains about this unused map.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#discussion_r740330227



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -464,6 +469,19 @@ public synchronized boolean shouldFailover(Exception ex) {
           return false;
         }
       }
+    } else if (unwrappedException instanceof RpcException) {
+      // Do not failover for following exceptions
+      if (unwrappedException instanceof RpcNoSuchMethodException ||
+          unwrappedException instanceof RpcNoSuchProtocolException ||
+          unwrappedException instanceof RPC.VersionMismatch) {
+        return false;
+      }
+      if (unwrappedException.getMessage().contains(
+          "RPC response exceeds maximum data length") ||

Review comment:
       Similar change we might need in SCM also, for OM-SCM protocol. Can we create this as a common utility method, so that it can be used in SCM also.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru commented on pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #2772:
URL: https://github.com/apache/ozone/pull/2772#issuecomment-961276115


   Thank you @bharatviswa504 and @adoroszlai for the reviews. Addressed your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] hanishakoneru merged pull request #2772: HDDS-5872. Do not failover on some RpcExceptions

Posted by GitBox <gi...@apache.org>.
hanishakoneru merged pull request #2772:
URL: https://github.com/apache/ozone/pull/2772


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org