You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/04/09 08:22:59 UTC

[GitHub] [ozone] symious commented on a diff in pull request #3272: HDDS-6553. Incorrect timeout when checking balancer iteration result.

symious commented on code in PR #3272:
URL: https://github.com/apache/ozone/pull/3272#discussion_r846599820


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java:
##########
@@ -462,49 +462,80 @@ private void checkIterationResults(boolean isMoveGenerated,
   private void checkIterationMoveResults(Set<DatanodeDetails> selectedTargets) {
     this.countDatanodesInvolvedPerIteration = 0;
     this.sizeMovedPerIteration = 0;
-    for (Map.Entry<ContainerMoveSelection,
-            CompletableFuture<ReplicationManager.MoveResult>>
-        futureEntry : moveSelectionToFutureMap.entrySet()) {
-      ContainerMoveSelection moveSelection = futureEntry.getKey();
-      CompletableFuture<ReplicationManager.MoveResult> future =
-          futureEntry.getValue();
+
+    CompletableFuture<Void> allFuturesResult = CompletableFuture.allOf(
+        moveSelectionToFutureMap.values()
+            .toArray(new CompletableFuture[moveSelectionToFutureMap.size()]));
+    try {
+      allFuturesResult.get(config.getMoveTimeout().toMillis(),
+          TimeUnit.MILLISECONDS);
+    } catch (InterruptedException e) {
+      Thread.currentThread().interrupt();
+    } catch (Exception e) {
+      // For exceptions other than InterruptedException, we ignore them.
+      e.printStackTrace();
+    }
+
+    List<ContainerMoveSelection> completeKeyList =
+        moveSelectionToFutureMap.keySet().stream()
+        .filter(key -> moveSelectionToFutureMap.get(key).isDone())
+        .collect(Collectors.toList());
+
+    // Handle completed futures
+    for (ContainerMoveSelection moveSelection : completeKeyList) {
       try {
-        ReplicationManager.MoveResult result = future.get(
-            config.getMoveTimeout().toMillis(), TimeUnit.MILLISECONDS);
-        if (result == ReplicationManager.MoveResult.COMPLETED) {
+        CompletableFuture<ReplicationManager.MoveResult> future =
+            moveSelectionToFutureMap.get(moveSelection);
+        if (future.isCompletedExceptionally()) {
           try {
-            ContainerInfo container =
-                containerManager.getContainer(moveSelection.getContainerID());
-            this.sizeMovedPerIteration += container.getUsedBytes();
-            metrics.incrementNumContainerMovesInLatestIteration(1);
-            LOG.info("Container move completed for container {} to target {}",
-                container.containerID(),
-                moveSelection.getTargetNode().getUuidString());
-          } catch (ContainerNotFoundException e) {
-            LOG.warn("Could not find Container {} while " +
-                    "checking move results in ContainerBalancer",
-                moveSelection.getContainerID(), e);
+            future.join();
+          } catch (CompletionException ex) {
+            LOG.info("Container move for container {} to target {} " +
+                    "failed with exceptions",
+                moveSelection.getContainerID().toString(),
+                moveSelection.getTargetNode().getUuidString(),
+                ex.getCause());
           }
         } else {
-          if (LOG.isDebugEnabled()) {
-            LOG.debug("Container move for container {} to target {} failed: {}",
-                moveSelection.getContainerID(),
-                moveSelection.getTargetNode().getUuidString(), result);
+          ReplicationManager.MoveResult result = future.join();
+          if (result == ReplicationManager.MoveResult.COMPLETED) {
+            try {

Review Comment:
   @lokeshj1703 Thanks for the review.
   There is a problem if added the post process of futures in whenComplete. The moves after timeout can not be cancelled, so the move will keep running until finished.
   As the TestContainerBalancer.checkIterationResultTimeout shows in https://github.com/symious/ozone/tree/HDDS-6553-
   2, the metrics of numContainerMovesInLatestIteration will always be larger than 1, which are incremented by the timeouted futures.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org