You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/08/12 23:35:08 UTC

[GitHub] [hadoop-ozone] hanishakoneru opened a new pull request #1324: HDDS-4068. Client should not retry same OM on network connection failure

hanishakoneru opened a new pull request #1324:
URL: https://github.com/apache/hadoop-ozone/pull/1324


   ## What changes were proposed in this pull request?
   
   On connection failure to an OM, client retries 10 times on same OM before failing over to the next OM. This is not optimal. Client should failover to next OM after 1 connection exception. In case the connection exception was spurious, then OM HA failover logic would lead the client to retry on that OM again after trying the other OMs first.
   (Please fill in changes proposed in this fix)
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4068
   
   ## How was this patch tested?
   
   Tested manually on a docker cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1324: HDDS-4068. Client should not retry same OM on network connection failure

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1324:
URL: https://github.com/apache/hadoop-ozone/pull/1324#discussion_r475966746



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -172,9 +174,12 @@ private OzoneManagerProtocolPB createOMProxy(InetSocketAddress omAddress)
         LegacyHadoopConfigurationSource.asHadoopConfiguration(conf);
     RPC.setProtocolEngine(hadoopConf, OzoneManagerProtocolPB.class,
         ProtobufRpcEngine.class);
-    return RPC.getProxy(OzoneManagerProtocolPB.class, omVersion, omAddress, ugi,
-        hadoopConf, NetUtils.getDefaultSocketFactory(hadoopConf),
-            (int) OmUtils.getOMClientRpcTimeOut(conf));
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .failoverOnNetworkException(0);
+    return RPC.getProtocolProxy(OzoneManagerProtocolPB.class, omVersion,
+        omAddress, ugi, hadoopConf, NetUtils.getDefaultSocketFactory(
+            hadoopConf), (int) OmUtils.getOMClientRpcTimeOut(conf),
+        connectionRetryPolicy).getProxy();

Review comment:
       One question, can we use RetryPolicy TRY_ONCE_THEN_FAIL here?
   
   Because in this failoverOnNetworkException, also we set retry count to zero and maxFailOvers to zero.
          




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] hanishakoneru commented on pull request #1324: HDDS-4068. Client should not retry same OM on network connection failure

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on pull request #1324:
URL: https://github.com/apache/hadoop-ozone/pull/1324#issuecomment-681131656


   Thanks @bharatviswa504 for reviewing the patch. Checked with @vivekratnavel that it is safe to commit since the coverage CI failed only during upload.
   Will merge the PR soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] hanishakoneru commented on a change in pull request #1324: HDDS-4068. Client should not retry same OM on network connection failure

Posted by GitBox <gi...@apache.org>.
hanishakoneru commented on a change in pull request #1324:
URL: https://github.com/apache/hadoop-ozone/pull/1324#discussion_r475972003



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -172,9 +174,12 @@ private OzoneManagerProtocolPB createOMProxy(InetSocketAddress omAddress)
         LegacyHadoopConfigurationSource.asHadoopConfiguration(conf);
     RPC.setProtocolEngine(hadoopConf, OzoneManagerProtocolPB.class,
         ProtobufRpcEngine.class);
-    return RPC.getProxy(OzoneManagerProtocolPB.class, omVersion, omAddress, ugi,
-        hadoopConf, NetUtils.getDefaultSocketFactory(hadoopConf),
-            (int) OmUtils.getOMClientRpcTimeOut(conf));
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .failoverOnNetworkException(0);
+    return RPC.getProtocolProxy(OzoneManagerProtocolPB.class, omVersion,
+        omAddress, ugi, hadoopConf, NetUtils.getDefaultSocketFactory(
+            hadoopConf), (int) OmUtils.getOMClientRpcTimeOut(conf),
+        connectionRetryPolicy).getProxy();

Review comment:
       It would be the same as the current one, right?
   Would it suffice to add a comment to explain the retry policy?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1324: HDDS-4068. Client should not retry same OM on network connection failure

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1324:
URL: https://github.com/apache/hadoop-ozone/pull/1324#discussion_r476093708



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -172,9 +174,12 @@ private OzoneManagerProtocolPB createOMProxy(InetSocketAddress omAddress)
         LegacyHadoopConfigurationSource.asHadoopConfiguration(conf);
     RPC.setProtocolEngine(hadoopConf, OzoneManagerProtocolPB.class,
         ProtobufRpcEngine.class);
-    return RPC.getProxy(OzoneManagerProtocolPB.class, omVersion, omAddress, ugi,
-        hadoopConf, NetUtils.getDefaultSocketFactory(hadoopConf),
-            (int) OmUtils.getOMClientRpcTimeOut(conf));
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .failoverOnNetworkException(0);
+    return RPC.getProtocolProxy(OzoneManagerProtocolPB.class, omVersion,
+        omAddress, ugi, hadoopConf, NetUtils.getDefaultSocketFactory(
+            hadoopConf), (int) OmUtils.getOMClientRpcTimeOut(conf),
+        connectionRetryPolicy).getProxy();

Review comment:
       Yes. As failoverOnNetworkException uses fallback as TRY_ONCE_THEN_FAIL and maxFailOvers is zero, so it is like TRY_ONCE_THEN_FAIL, as in shouldretry it will fail in below part shouldRetry i think.
   
   
   ```
         if (failovers >= maxFailovers) {
           return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
               "failovers (" + failovers + ") exceeded maximum allowed ("
               + maxFailovers + ")");
         }
   ```
   
   So, tehnically we are using it as similar to TRY_ONCE_THEN_FAIL in this scenario.

##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -172,9 +174,12 @@ private OzoneManagerProtocolPB createOMProxy(InetSocketAddress omAddress)
         LegacyHadoopConfigurationSource.asHadoopConfiguration(conf);
     RPC.setProtocolEngine(hadoopConf, OzoneManagerProtocolPB.class,
         ProtobufRpcEngine.class);
-    return RPC.getProxy(OzoneManagerProtocolPB.class, omVersion, omAddress, ugi,
-        hadoopConf, NetUtils.getDefaultSocketFactory(hadoopConf),
-            (int) OmUtils.getOMClientRpcTimeOut(conf));
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .failoverOnNetworkException(0);
+    return RPC.getProtocolProxy(OzoneManagerProtocolPB.class, omVersion,
+        omAddress, ugi, hadoopConf, NetUtils.getDefaultSocketFactory(
+            hadoopConf), (int) OmUtils.getOMClientRpcTimeOut(conf),
+        connectionRetryPolicy).getProxy();

Review comment:
       Yes. As failoverOnNetworkException uses fallback as TRY_ONCE_THEN_FAIL and maxFailOvers is zero, so it is like TRY_ONCE_THEN_FAIL, as in shouldretry it will fail in below part shouldRetry i think.
   
   
   ```
         if (failovers >= maxFailovers) {
           return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
               "failovers (" + failovers + ") exceeded maximum allowed ("
               + maxFailovers + ")");
         }
   ```
   
   So, technically we are using it as similar to TRY_ONCE_THEN_FAIL in this scenario.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1324: HDDS-4068. Client should not retry same OM on network connection failure

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1324:
URL: https://github.com/apache/hadoop-ozone/pull/1324#discussion_r476093708



##########
File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -172,9 +174,12 @@ private OzoneManagerProtocolPB createOMProxy(InetSocketAddress omAddress)
         LegacyHadoopConfigurationSource.asHadoopConfiguration(conf);
     RPC.setProtocolEngine(hadoopConf, OzoneManagerProtocolPB.class,
         ProtobufRpcEngine.class);
-    return RPC.getProxy(OzoneManagerProtocolPB.class, omVersion, omAddress, ugi,
-        hadoopConf, NetUtils.getDefaultSocketFactory(hadoopConf),
-            (int) OmUtils.getOMClientRpcTimeOut(conf));
+    RetryPolicy connectionRetryPolicy = RetryPolicies
+        .failoverOnNetworkException(0);
+    return RPC.getProtocolProxy(OzoneManagerProtocolPB.class, omVersion,
+        omAddress, ugi, hadoopConf, NetUtils.getDefaultSocketFactory(
+            hadoopConf), (int) OmUtils.getOMClientRpcTimeOut(conf),
+        connectionRetryPolicy).getProxy();

Review comment:
       Yes. As failoverOnNetworkException uses fallback as TRY_ONCE_THEN_FAIL and maxFailOvers is zero, so it is like TRY_ONCE_THEN_FAIL, as in shouldretry it will fail in below part shouldRetry i think.
   
   
   ```
         if (failovers >= maxFailovers) {
           return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
               "failovers (" + failovers + ") exceeded maximum allowed ("
               + maxFailovers + ")");
         }
   ```
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] hanishakoneru merged pull request #1324: HDDS-4068. Client should not retry same OM on network connection failure

Posted by GitBox <gi...@apache.org>.
hanishakoneru merged pull request #1324:
URL: https://github.com/apache/hadoop-ozone/pull/1324


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org