You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "saxenapranav (via GitHub)" <gi...@apache.org> on 2023/06/05 03:51:46 UTC

[GitHub] [hadoop] saxenapranav commented on a diff in pull request #5711: Hadoop-18759. [ABFS][Backoff-Optimization] Have a Linear retry policy for connection timeout.

saxenapranav commented on code in PR #5711:
URL: https://github.com/apache/hadoop/pull/5711#discussion_r1217412926


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java:
##########
@@ -1143,6 +1175,11 @@ public void setEnableAbfsListIterator(boolean enableAbfsListIterator) {
     this.enableAbfsListIterator = enableAbfsListIterator;
   }
 
+  @VisibleForTesting
+  public void setLinearRetryDoubleStepUpEnabled(final boolean linearRetryDoubleStepUpEnabled) {

Review Comment:
   Lets avoid setter.



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/LinearRetryPolicy.java:
##########
@@ -0,0 +1,157 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import org.apache.hadoop.fs.azurebfs.AbfsConfiguration;
+import org.apache.hadoop.classification.VisibleForTesting;
+
+/**
+ * Linear Retry policy used by AbfsClient.
+ * */
+public class LinearRetryPolicy extends RetryPolicy{
+  
+  /**
+   * Represents the default maximum amount of time used when calculating the
+   * linear delay between retries.
+   */
+  private static final int DEFAULT_MAX_BACKOFF = 1000 * 30; // 30s
+
+  /**
+   * Represents the default minimum amount of time used when calculating the
+   * linear delay between retries.
+   */
+  private static final int DEFAULT_MIN_BACKOFF = 500 * 1; // 500ms
+
+  /**
+   * Represents the delta by which retry interval should be incremented
+   * for each retry count
+   */
+  private static final int INTERVAL_DELTA_ONE_SEND = 1000; // 1s
+
+  /**
+   * The maximum backoff time.
+   */
+  private final int maxBackoff;
+
+  /**
+   * The minimum backoff time.
+   */
+  private final int minBackoff;
+
+  /**
+   * The maximum number of retry attempts.
+   */
+  private final int retryCount;
+
+  /**
+   * Whether we want to double up the retry interval
+   * True: Double Up
+   * False: Increase by 1.
+   */
+  private final boolean doubleStepUpEnabled;
+
+  /**
+   * Initializes a new instance of the {@link LinearRetryPolicy} class.
+   */
+  public LinearRetryPolicy(final int maxIoRetries) {
+
+    this(maxIoRetries, DEFAULT_MIN_BACKOFF, DEFAULT_MAX_BACKOFF,
+        true);
+  }
+
+  /**
+   * Initializes a new instance of the {@link LinearRetryPolicy} class.
+   *
+   * @param conf The {@link AbfsConfiguration} from which to retrieve retry configuration.
+   */
+  public LinearRetryPolicy(AbfsConfiguration conf) {
+    this(conf.getMaxIoRetries(),
+        conf.getMinBackoffIntervalMillisecondsForConnectionTimeout(),
+        conf.getMaxBackoffIntervalMillisecondsForConnectionTimeout(),
+        conf.getLinearRetryDoubleStepUpEnabled());
+  }
+
+  /**
+   * Initializes a new instance of the {@link LinearRetryPolicy} class.
+   *
+   * @param retryCount The maximum number of retry attempts.
+   * @param minBackoff The minimum backoff time.
+   * @param maxBackoff The maximum backoff time.
+   * @param doubleStepUpEnabled Type of linear increment, double or increment
+   */
+  public LinearRetryPolicy(final int retryCount, final int minBackoff, final int maxBackoff, final boolean doubleStepUpEnabled) {
+    this.retryCount = retryCount;
+    this.minBackoff = minBackoff;
+    this.maxBackoff = maxBackoff;
+    this.doubleStepUpEnabled = doubleStepUpEnabled;
+  }
+
+  /**
+   * Returns if a request should be retried based on the retry count
+   *
+   * @param retryCount The current retry attempt count.
+   * @param statusCode The status code of last failed request
+   * @return true if the request should be retried; false otherwise.
+   */
+  public boolean shouldRetry(final int retryCount, final int statusCode) {
+    return retryCount < this.retryCount;
+  }
+
+  /**
+   * Returns backoff interval based on the type of linear backoff enabled
+   * if doubleStepUpEnabled, double the minBackoff retryCount times
+   * else, add 1000ms to minBackoff retryCount times
+   *
+   * @param retryCount The current retry attempt count.
+   * @return backoff Interval time
+   */
+  public long getRetryInterval(final int retryCount) {
+    if (retryCount <= 0)
+        return minBackoff;
+
+    final double incrementDelta = doubleStepUpEnabled

Review Comment:
   lets have it int. I feel all values in equation are int.



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRestOperation.java:
##########
@@ -347,7 +347,7 @@ private boolean executeHttpOperation(final int retryCount,
 
     LOG.debug("HttpRequest: {}: {}", operationType, httpOperation);
 
-    if (client.getRetryPolicy().shouldRetry(retryCount, httpOperation.getStatusCode())) {
+    if (client.getRetryPolicy(failureReason).shouldRetry(retryCount, httpOperation.getStatusCode())) {

Review Comment:
   Can we avoid having client.getRetryPolicy(failureReason). Can there be a way where each iteration knows which retryPolicy obj to be used. Instance of problem here:
   1. iteration0 failed with connectionTimeout, at line 350, failureReason would be CT and it would fire for LinearRetryPolicy.
   2. But at this point, it has got lets say 503. Som 352 become lets say 503_ING.
   3. Now the sleep would happen as per exponentialRetryPolicy.
   4. Problem here being, shouldRetry fired as per LinearRetryPolicy, and sleep happened as per ExponentialRetryPolicy.



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/RetryPolicy.java:
##########
@@ -0,0 +1,46 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+/**
+ * Abstract Class for Retry policy to be used by {@link AbfsClient}
+ * Implementation to be used is based on retry cause.
+ */
+public abstract class RetryPolicy {
+
+  /**
+   * Returns if a request should be retried based on the retry count, current response,
+   * and the current strategy.
+   * Child class should define exact behavior
+   *
+   * @param retryCount The current retry attempt count.
+   * @param statusCode The status code of the response, or -1 for socket error.
+   * @return true if the request should be retried; false otherwise.
+   */
+  public abstract boolean shouldRetry(final int retryCount, final int statusCode);

Review Comment:
   should we implement it here and remove the implementation from ExponentialretryPolicy. Reason being, we can have the same heuristic in LinearRetryPolicy. Right now, LinearRetryPolicy has only `retryCount < this.retryCount` with assumption that its only for ConnectionTimeout errors. But LinearRetryPolicy can be used for more exceptions in future.
   What you say?



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:
##########
@@ -217,8 +220,19 @@ protected AbfsPerfTracker getAbfsPerfTracker() {
     return abfsPerfTracker;
   }
 
-  ExponentialRetryPolicy getRetryPolicy() {
-    return retryPolicy;
+  ExponentialRetryPolicy getExponentialRetryPolicy() {
+    return exponentialRetryPolicy;
+  }
+
+  LinearRetryPolicy getLinearRetryPolicy() {
+    return linearRetryPolicy;
+  }
+
+  public RetryPolicy getRetryPolicy(final String failureReason) {
+    if (failureReason == null || !failureReason.equals(CONNECTION_TIMEOUT_ABBREVIATION))

Review Comment:
   we can have `!CONNECTION_TIMEOUT_ABBREVIATION.equals(failureReason)`. would not need null-check then.



##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/LinearRetryPolicy.java:
##########
@@ -0,0 +1,157 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import org.apache.hadoop.fs.azurebfs.AbfsConfiguration;
+import org.apache.hadoop.classification.VisibleForTesting;
+
+/**
+ * Linear Retry policy used by AbfsClient.
+ * */
+public class LinearRetryPolicy extends RetryPolicy{
+  
+  /**
+   * Represents the default maximum amount of time used when calculating the
+   * linear delay between retries.
+   */
+  private static final int DEFAULT_MAX_BACKOFF = 1000 * 30; // 30s
+
+  /**
+   * Represents the default minimum amount of time used when calculating the
+   * linear delay between retries.
+   */
+  private static final int DEFAULT_MIN_BACKOFF = 500 * 1; // 500ms
+
+  /**
+   * Represents the delta by which retry interval should be incremented
+   * for each retry count
+   */
+  private static final int INTERVAL_DELTA_ONE_SEND = 1000; // 1s

Review Comment:
   nit: lets have it `INTERVAL_DELTA_ONE_SECOND`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org