You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/06/02 07:48:37 UTC

[GitHub] [hadoop-ozone] lokeshj1703 opened a new pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

lokeshj1703 opened a new pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005


   ## What changes were proposed in this pull request?
   
   Currently any ozone client request can spend a huge amount of time in retries and ozone client can retry its requests very aggressively. The waiting time can thus be very high before a client request fails. Further aggressive retries by ratis client used by ozone can bog down a ratis pipeline leader. The Jira aims to make changes to the current retry behavior in Ozone client.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-3350
   
   ## How was this patch tested?
   
   Teragen results were compared with and without the new retry policy. The results are uploaded in Apache jira.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r435870208



##########
File path: pom.xml
##########
@@ -79,7 +79,7 @@ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xs
     <declared.ozone.version>${ozone.version}</declared.ozone.version>
 
     <!-- Apache Ratis version -->
-    <ratis.version>0.6.0-cac3336-SNAPSHOT</ratis.version>
+    <ratis.version>0.6.0-6ab75ae-SNAPSHOT</ratis.version>

Review comment:
       Ratis snapshot version update is required for compilation. Once HDDS-3654 is committed, I'll remove these changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-640569203






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-643915946


   @bshashikant I have updated the PR. 
   
   There is a test failure in Test2WayCommitInRatis in the latest run. I ran the test 10 times and it passes locally. The test failure can also be seen in https://elek.github.io/ozone-build-results/ in multiple PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r438144281



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java
##########
@@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf,
     return tlsConfig;
   }
 

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bshashikant commented on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-637972830


   @lokeshj1703 , can you please update the patch?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bshashikant commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r437566605



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java
##########
@@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf,
     return tlsConfig;
   }
 
-  static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
-    int maxRetryCount =
-        conf.getInt(OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY,
+  public static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
+    ExponentialBackoffRetry exponentialBackoffRetry =
+        createExponentialBackoffPolicy(conf);
+    MultipleLinearRandomRetry multipleLinearRandomRetry =
+        MultipleLinearRandomRetry.parseCommaSeparated(conf.get(
+            OzoneConfigKeys.DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY,
             OzoneConfigKeys.
-                DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT);
-    long retryInterval = conf.getTimeDuration(OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY, OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_DEFAULT
-        .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
-    TimeDuration sleepDuration =
-        TimeDuration.valueOf(retryInterval, TimeUnit.MILLISECONDS);
-    RetryPolicy retryPolicy = RetryPolicies
-        .retryUpToMaximumCountWithFixedSleep(maxRetryCount, sleepDuration);
-    return retryPolicy;
+                DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY_DEFAULT));
+
+    long writeTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long watchTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+
+    return RequestTypeDependentRetryPolicy.newBuilder()
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, exponentialBackoffRetry))
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, RetryPolicies.noRetry()))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            TimeDuration.valueOf(writeTimeout, TimeUnit.MILLISECONDS))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            TimeDuration.valueOf(watchTimeout, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExponentialBackoffRetry createExponentialBackoffPolicy(
+      ConfigurationSource conf) {
+    long exponentialBaseSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP,
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long exponentialMaxSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP,
+        OzoneConfigKeys.
+            DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    return ExponentialBackoffRetry.newBuilder()
+        .setBaseSleepTime(
+            TimeDuration.valueOf(exponentialBaseSleep, TimeUnit.MILLISECONDS))
+        .setMaxSleepTime(
+            TimeDuration.valueOf(exponentialMaxSleep, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExceptionDependentRetry createExceptionDependentPolicy(
+      ExponentialBackoffRetry exponentialBackoffRetry,
+      MultipleLinearRandomRetry multipleLinearRandomRetry,

Review comment:
       Instead of explicitly adding no retry for specific exception , can we define a static list wof exceptions for which there will be no retry and iterate here. ?
   

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java
##########
@@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf,
     return tlsConfig;
   }
 
-  static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
-    int maxRetryCount =
-        conf.getInt(OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY,
+  public static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
+    ExponentialBackoffRetry exponentialBackoffRetry =
+        createExponentialBackoffPolicy(conf);
+    MultipleLinearRandomRetry multipleLinearRandomRetry =
+        MultipleLinearRandomRetry.parseCommaSeparated(conf.get(
+            OzoneConfigKeys.DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY,
             OzoneConfigKeys.
-                DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT);
-    long retryInterval = conf.getTimeDuration(OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY, OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_DEFAULT
-        .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
-    TimeDuration sleepDuration =
-        TimeDuration.valueOf(retryInterval, TimeUnit.MILLISECONDS);
-    RetryPolicy retryPolicy = RetryPolicies
-        .retryUpToMaximumCountWithFixedSleep(maxRetryCount, sleepDuration);
-    return retryPolicy;
+                DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY_DEFAULT));
+
+    long writeTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long watchTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+
+    return RequestTypeDependentRetryPolicy.newBuilder()
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, exponentialBackoffRetry))
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, RetryPolicies.noRetry()))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            TimeDuration.valueOf(writeTimeout, TimeUnit.MILLISECONDS))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            TimeDuration.valueOf(watchTimeout, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExponentialBackoffRetry createExponentialBackoffPolicy(
+      ConfigurationSource conf) {
+    long exponentialBaseSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP,
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long exponentialMaxSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP,
+        OzoneConfigKeys.
+            DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    return ExponentialBackoffRetry.newBuilder()
+        .setBaseSleepTime(
+            TimeDuration.valueOf(exponentialBaseSleep, TimeUnit.MILLISECONDS))
+        .setMaxSleepTime(
+            TimeDuration.valueOf(exponentialMaxSleep, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExceptionDependentRetry createExceptionDependentPolicy(
+      ExponentialBackoffRetry exponentialBackoffRetry,
+      MultipleLinearRandomRetry multipleLinearRandomRetry,

Review comment:
       I think we need to add AlreadyClosedException to the list of no retry here( as this can be generated from ratis server as well). Also, for RfatLogIOException , the policy should be of no retry.

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java
##########
@@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf,
     return tlsConfig;
   }
 

Review comment:
       can we add a table defining the exception no retry policy relationship for better understanding?

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
##########
@@ -281,15 +281,33 @@
 
   public static final String DFS_CONTAINER_RATIS_DATANODE_STORAGE_DIR =
       "dfs.container.ratis.datanode.storage.dir";
-  public static final String DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY =
-      ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY;
-  public static final int DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT =
-      ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT;
-  public static final String DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY =
-      ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY;
+

Review comment:
       i guess it would be better to define these configs in RatisClientConfig instead of defining here.

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java
##########
@@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf,
     return tlsConfig;
   }
 
-  static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
-    int maxRetryCount =
-        conf.getInt(OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY,
+  public static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
+    ExponentialBackoffRetry exponentialBackoffRetry =
+        createExponentialBackoffPolicy(conf);
+    MultipleLinearRandomRetry multipleLinearRandomRetry =
+        MultipleLinearRandomRetry.parseCommaSeparated(conf.get(
+            OzoneConfigKeys.DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY,
             OzoneConfigKeys.
-                DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT);
-    long retryInterval = conf.getTimeDuration(OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY, OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_DEFAULT
-        .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
-    TimeDuration sleepDuration =
-        TimeDuration.valueOf(retryInterval, TimeUnit.MILLISECONDS);
-    RetryPolicy retryPolicy = RetryPolicies
-        .retryUpToMaximumCountWithFixedSleep(maxRetryCount, sleepDuration);
-    return retryPolicy;
+                DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY_DEFAULT));
+
+    long writeTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long watchTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+
+    return RequestTypeDependentRetryPolicy.newBuilder()
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, exponentialBackoffRetry))
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, RetryPolicies.noRetry()))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            TimeDuration.valueOf(writeTimeout, TimeUnit.MILLISECONDS))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            TimeDuration.valueOf(watchTimeout, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExponentialBackoffRetry createExponentialBackoffPolicy(
+      ConfigurationSource conf) {
+    long exponentialBaseSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP,
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long exponentialMaxSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP,
+        OzoneConfigKeys.
+            DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    return ExponentialBackoffRetry.newBuilder()
+        .setBaseSleepTime(
+            TimeDuration.valueOf(exponentialBaseSleep, TimeUnit.MILLISECONDS))
+        .setMaxSleepTime(
+            TimeDuration.valueOf(exponentialMaxSleep, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExceptionDependentRetry createExceptionDependentPolicy(
+      ExponentialBackoffRetry exponentialBackoffRetry,
+      MultipleLinearRandomRetry multipleLinearRandomRetry,

Review comment:
       Instead of explicitly adding no retry for specific exception , can we define a static list of exceptions for which there will be no retry and iterate here. ?
   

##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java
##########
@@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf,
     return tlsConfig;
   }
 

Review comment:
       can we add a table defining the exception to retry policy relationship for better understanding?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bshashikant merged pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
bshashikant merged pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] codecov-commenter commented on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-643219318


   # [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=h1) Report
   > :exclamation: No coverage uploaded for pull request base (`master@10b6470`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hadoop-ozone/pull/1005/graphs/tree.svg?width=650&height=150&src=pr&token=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #1005   +/-   ##
   =========================================
     Coverage          ?   69.40%           
     Complexity        ?     9120           
   =========================================
     Files             ?      961           
     Lines             ?    48172           
     Branches          ?     4678           
   =========================================
     Hits              ?    33433           
     Misses            ?    12523           
     Partials          ?     2216           
   ```
   
   
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=footer). Last update [10b6470...f615b48](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-639434963


   /pending


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bshashikant commented on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-643155376


   Unit test failures are related. @lokeshj1703 can you please check?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r438880701



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java
##########
@@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf,
     return tlsConfig;
   }
 
-  static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
-    int maxRetryCount =
-        conf.getInt(OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY,
+  public static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
+    ExponentialBackoffRetry exponentialBackoffRetry =
+        createExponentialBackoffPolicy(conf);
+    MultipleLinearRandomRetry multipleLinearRandomRetry =
+        MultipleLinearRandomRetry.parseCommaSeparated(conf.get(
+            OzoneConfigKeys.DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY,
             OzoneConfigKeys.
-                DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT);
-    long retryInterval = conf.getTimeDuration(OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY, OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_DEFAULT
-        .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
-    TimeDuration sleepDuration =
-        TimeDuration.valueOf(retryInterval, TimeUnit.MILLISECONDS);
-    RetryPolicy retryPolicy = RetryPolicies
-        .retryUpToMaximumCountWithFixedSleep(maxRetryCount, sleepDuration);
-    return retryPolicy;
+                DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY_DEFAULT));
+
+    long writeTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long watchTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+
+    return RequestTypeDependentRetryPolicy.newBuilder()
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, exponentialBackoffRetry))
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, RetryPolicies.noRetry()))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            TimeDuration.valueOf(writeTimeout, TimeUnit.MILLISECONDS))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            TimeDuration.valueOf(watchTimeout, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExponentialBackoffRetry createExponentialBackoffPolicy(
+      ConfigurationSource conf) {
+    long exponentialBaseSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP,
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long exponentialMaxSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP,
+        OzoneConfigKeys.
+            DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    return ExponentialBackoffRetry.newBuilder()
+        .setBaseSleepTime(
+            TimeDuration.valueOf(exponentialBaseSleep, TimeUnit.MILLISECONDS))
+        .setMaxSleepTime(
+            TimeDuration.valueOf(exponentialMaxSleep, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExceptionDependentRetry createExceptionDependentPolicy(
+      ExponentialBackoffRetry exponentialBackoffRetry,
+      MultipleLinearRandomRetry multipleLinearRandomRetry,

Review comment:
       Adding NoRetry for AlreadyClosedException was leading to test failures. This exception is generated for closed connections in Ratis. I have removed NoRetry policy for this exception.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r438144376



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
##########
@@ -281,15 +281,33 @@
 
   public static final String DFS_CONTAINER_RATIS_DATANODE_STORAGE_DIR =
       "dfs.container.ratis.datanode.storage.dir";
-  public static final String DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY =
-      ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY;
-  public static final int DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT =
-      ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT;
-  public static final String DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY =
-      ScmConfigKeys.DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY;
+

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bshashikant commented on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-644549079


   Thanks @lokeshj1703 for working on this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] adoroszlai commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r435297078



##########
File path: pom.xml
##########
@@ -79,7 +79,7 @@ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xs
     <declared.ozone.version>${ozone.version}</declared.ozone.version>
 
     <!-- Apache Ratis version -->
-    <ratis.version>0.6.0-cac3336-SNAPSHOT</ratis.version>
+    <ratis.version>0.6.0-6ab75ae-SNAPSHOT</ratis.version>

Review comment:
       Duplicate of HDDS-3564 maybe.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r435871997



##########
File path: pom.xml
##########
@@ -79,7 +79,7 @@ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xs
     <declared.ozone.version>${ozone.version}</declared.ozone.version>
 
     <!-- Apache Ratis version -->
-    <ratis.version>0.6.0-cac3336-SNAPSHOT</ratis.version>
+    <ratis.version>0.6.0-6ab75ae-SNAPSHOT</ratis.version>

Review comment:
       Marking it as pending until 3654 is committed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 edited a comment on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 edited a comment on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-640569203


   OM HA related tests seem related to ratis snapshot upgrade and metrics registry. They will be fixed with HDDS-3654.
   There are test timeouts in TestOzoneRpcClient which are happening because watch requests are committed by majority. The test shows appendEntry timeout from leader to follower. It does not seem to be fixed with latest ratis snapshot.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
arp7 commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r435293900



##########
File path: pom.xml
##########
@@ -79,7 +79,7 @@ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xs
     <declared.ozone.version>${ozone.version}</declared.ozone.version>
 
     <!-- Apache Ratis version -->
-    <ratis.version>0.6.0-cac3336-SNAPSHOT</ratis.version>
+    <ratis.version>0.6.0-6ab75ae-SNAPSHOT</ratis.version>

Review comment:
       Is this a duplicate of HDDS-3350? Let's decouple the Ratis version update from the rest of the changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] codecov-commenter edited a comment on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-643219318


   # [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=h1) Report
   > :exclamation: No coverage uploaded for pull request base (`master@10b6470`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hadoop-ozone/pull/1005/graphs/tree.svg?width=650&height=150&src=pr&token=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #1005   +/-   ##
   =========================================
     Coverage          ?   69.42%           
     Complexity        ?     9124           
   =========================================
     Files             ?      961           
     Lines             ?    48173           
     Branches          ?     4679           
   =========================================
     Hits              ?    33446           
     Misses            ?    12513           
     Partials          ?     2214           
   ```
   
   
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=footer). Last update [10b6470...f615b48](https://codecov.io/gh/apache/hadoop-ozone/pull/1005?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#issuecomment-643946644


   I have created https://issues.apache.org/jira/browse/HDDS-3799 for tracking the failure related to Test2WayCommitInRatis.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #1005: HDDS-3350. Ozone Retry Policy Improvements.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #1005:
URL: https://github.com/apache/hadoop-ozone/pull/1005#discussion_r438143919



##########
File path: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java
##########
@@ -269,23 +282,76 @@ static GrpcTlsConfig createTlsClientConfig(SecurityConfig conf,
     return tlsConfig;
   }
 
-  static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
-    int maxRetryCount =
-        conf.getInt(OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_KEY,
+  public static RetryPolicy createRetryPolicy(ConfigurationSource conf) {
+    ExponentialBackoffRetry exponentialBackoffRetry =
+        createExponentialBackoffPolicy(conf);
+    MultipleLinearRandomRetry multipleLinearRandomRetry =
+        MultipleLinearRandomRetry.parseCommaSeparated(conf.get(
+            OzoneConfigKeys.DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY,
             OzoneConfigKeys.
-                DFS_RATIS_CLIENT_REQUEST_MAX_RETRIES_DEFAULT);
-    long retryInterval = conf.getTimeDuration(OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_KEY, OzoneConfigKeys.
-        DFS_RATIS_CLIENT_REQUEST_RETRY_INTERVAL_DEFAULT
-        .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
-    TimeDuration sleepDuration =
-        TimeDuration.valueOf(retryInterval, TimeUnit.MILLISECONDS);
-    RetryPolicy retryPolicy = RetryPolicies
-        .retryUpToMaximumCountWithFixedSleep(maxRetryCount, sleepDuration);
-    return retryPolicy;
+                DFS_RATIS_CLIENT_MULTILINEAR_RANDOM_RETRY_POLICY_DEFAULT));
+
+    long writeTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WRITE_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long watchTimeout = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT, OzoneConfigKeys.
+            DFS_RATIS_CLIENT_REQUEST_WATCH_TIMEOUT_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+
+    return RequestTypeDependentRetryPolicy.newBuilder()
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, exponentialBackoffRetry))
+        .setRetryPolicy(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            createExceptionDependentPolicy(exponentialBackoffRetry,
+                multipleLinearRandomRetry, RetryPolicies.noRetry()))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WRITE,
+            TimeDuration.valueOf(writeTimeout, TimeUnit.MILLISECONDS))
+        .setTimeout(RaftProtos.RaftClientRequestProto.TypeCase.WATCH,
+            TimeDuration.valueOf(watchTimeout, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExponentialBackoffRetry createExponentialBackoffPolicy(
+      ConfigurationSource conf) {
+    long exponentialBaseSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP,
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_BASE_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    long exponentialMaxSleep = conf.getTimeDuration(
+        OzoneConfigKeys.DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP,
+        OzoneConfigKeys.
+            DFS_RATIS_CLIENT_EXPONENTIAL_BACKOFF_MAX_SLEEP_DEFAULT
+            .toIntExact(TimeUnit.MILLISECONDS), TimeUnit.MILLISECONDS);
+    return ExponentialBackoffRetry.newBuilder()
+        .setBaseSleepTime(
+            TimeDuration.valueOf(exponentialBaseSleep, TimeUnit.MILLISECONDS))
+        .setMaxSleepTime(
+            TimeDuration.valueOf(exponentialMaxSleep, TimeUnit.MILLISECONDS))
+        .build();
+  }
+
+  private static ExceptionDependentRetry createExceptionDependentPolicy(
+      ExponentialBackoffRetry exponentialBackoffRetry,
+      MultipleLinearRandomRetry multipleLinearRandomRetry,

Review comment:
       RaftLogIOException is never received at raft client. I have added AlreadyClosedException.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org