You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/06/18 21:08:08 UTC

[GitHub] [ozone] JacksonYao287 opened a new pull request #2349: HDDS-4928. Support container move in Replication Manager

JacksonYao287 opened a new pull request #2349:
URL: https://github.com/apache/ozone/pull/2349


   ## What changes were proposed in this pull request?
   
   add a move API in replication manager ,  and it will be used by container balancer
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-4928
   
   ## How was this patch tested?
   
   add unit test
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-881106909


   @lokeshj1703 Thanks for the review
   >I would suggest adding a UT to make sure RM does not delete replica if placement policy is not satisfied.
   
    i will add this UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 edited a comment on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 edited a comment on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-874095030


   i have updated the PR ,  @lokeshj1703  PTAL!  if there is no more logic problem , i will add unit test for move


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r663914240



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    currentNodeStat = nodeManager.getNodeStatus(targetDn);
+    healthStat = currentNodeStat.getHealth();
+    operationalState = currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    // we need to synchronize on ContainerInfo, since it is
+    // shared by ICR/FCR handler and this.processContainer
+    // TODO: use a Read lock after introducing a RW lock into ContainerInfo
+    ContainerInfo cif = containerManager.getContainer(cid);
+    synchronized (cif) {
+      final Set<DatanodeDetails> replicas = containerManager
+            .getContainerReplicas(cid).stream()
+            .map(ContainerReplica::getDatanodeDetails)
+            .collect(Collectors.toSet());
+      if (replicas.contains(targetDn)) {
+        LOG.info("given container exists in the target Datanode");
+        return ret;
+      }
+      if (!replicas.contains(srcDn)) {
+        LOG.info("given container does not exist in the source Datanode");
+        return ret;
+      }
+
+      /*
+      * the reason why the given container should not be taking any inflight
+      * action is that: if the given container is being replicated or deleted,
+      * the num of its replica is not deterministic, so move operation issued
+      * by balancer may cause a nondeterministic result, so we should drop
+      * this option for this time.
+      * */
+
+      if (inflightReplication.containsKey(cid)) {
+        LOG.info("given container is in inflight replication");
+        return ret;
+      }
+      if (inflightDeletion.containsKey(cid)) {
+        LOG.info("given container is in inflight deletion");
+        return ret;
+      }
+
+      /*
+      * here, no need to see whether cid is in inflightMove, because
+      * these three map are all synchronized on ContainerInfo, if cid
+      * is in infligtMove , it must now being replicated or deleted,
+      * so it must be in inflightReplication or in infligthDeletion.
+      * thus, if we can not find cid in both of them , this cid must
+      * not be in inflightMove.
+      */
+

Review comment:
       yea, it makes sense, i will change this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-873843117


   > this idea seems good, but i have a little question. i think even if src and target are both present and container is over-replicated , deletion can not be definitely sent for source. if the placementPolicy is not met after src is deleted, i think we can not directly send a delete command to src.
   
   We can use a simple approach initially where if placement policy is not satisfied with replicas + target - src, we fail the move future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 edited a comment on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 edited a comment on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-881107005


   @ChenSammi  @siddhantsangwan PTAL!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 edited a comment on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 edited a comment on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-874095030


   i have updated the PR ,  @lokeshj1703  PTAL!  if there is no more logic problem , i will add unit test for move


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-863887686


   @ChenSammi @GlenGeng @lokeshj1703 @siddhantsangwan  PATL!
   
   I will add more test case for move in additional commits of this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r660265464



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,

Review comment:
       it seem good, i will  add more MoveResult for the errors




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] siddhantsangwan commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
siddhantsangwan commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r664238672



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,43 +1153,128 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
             break;
           }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
+        }
+        if (isTargetDnInReplicaSet) {
+          if (isSourceDnInReplicaSet) {
+            // if the target is present, and source disappears somehow,
+            // we can consider move is successful.
+            inflightMoveFuture.get(id).complete(MoveResult.COMPLETED);
+            inflightMove.remove(id);
+            inflightMoveFuture.remove(id);
+          } else {
+            // if the container is in inflightMove and target datanode is
+            // included in the replicas, then swap the source datanode to
+            // first of the replica list if exists, so the source datanode
+            // will be first removed if possible.
+            eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));
+          }

Review comment:
       Can you please explain this part? If both target and source are present in replicas, then source should be swapped with the first node in replicas list so it can be deleted, right? And move completes if source is absent while target is present. It seems the logic of if and else parts is opposite.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r660265464



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,

Review comment:
       it seem good, i think completed exceptionally is a better choice. i will do this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r658648587



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -134,6 +139,44 @@
    */
   private final Map<ContainerID, List<InflightAction>> inflightDeletion;
 
+  /**
+   * This is used for tracking container move commands
+   * which are not yet complete.
+   */
+  private final Map<ContainerID,
+      Pair<DatanodeDetails, DatanodeDetails>> inflightMove;
+
+  /**
+   * This is used for indicating the result of move option and
+   * the corresponding reason. this is useful for tracking
+   * the result of move option
+   */
+  enum MoveResult {
+    // both replication and deletion are completed
+    COMPLETED,
+    // replication fail because of timeout
+    REPLICATION_FAIL_TIME_OUT,
+    // replication fail because node is unhealthy
+    REPLICATION_FAIL_NODE_UNHEALTHY,
+    // replication succeed, but deletion fail because of timeout
+    DELETION_FAIL_TIME_OUT,
+    // replication succeed, but deletion fail because because
+    // node is unhealthy
+    DELETION_FAIL_NODE_UNHEALTHY,
+    // replication succeed, but if we delete the container from
+    // the source datanode , the policy(eg, replica num or
+    // rack location) will not be satisfied, so we should not delete
+    // the container
+    DELETE_FAIL_POLICY
+  }

Review comment:
       It would also be useful to have metrics for these move results.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);

Review comment:
       We can remove the log and complete future exceptionally or use a MoveResult for the different errors in this function.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }

Review comment:
       We do not need to calculate `sourceDnPos`. We can simply call.
   `eligibleReplicas.remove(sourceDN)`

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,

Review comment:
       I think it would be better to return CompletableFuture always. In the cases where move is not accepted, future can be completed exceptionally with the appropriate exception message. Or we could add more MoveResult for the errors.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);

Review comment:
       Further I think REPLICATION_FAIL_NODE_UNHEALTHY can be used here.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -458,6 +505,37 @@ private void updateInflightAction(final ContainerInfo container,
           if (health != NodeState.HEALTHY || a.time < deadline
               || filter.test(a)) {
             iter.remove();
+
+            if (inflightMove.containsKey(id)) {
+              boolean isInflightReplication =
+                  inflightActions.equals(inflightReplication);
+              //if replication is completed , nothing to do
+              if (!(isInflightReplication && filter.test(a))) {
+                if (isInflightReplication) {
+                  if (health != NodeState.HEALTHY) {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.REPLICATION_FAIL_NODE_UNHEALTHY);
+                  } else {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.REPLICATION_FAIL_TIME_OUT);
+                  }
+                } else {
+                  if (health != NodeState.HEALTHY) {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.DELETION_FAIL_NODE_UNHEALTHY);
+                  } else if (a.time < deadline) {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.DELETION_FAIL_TIME_OUT);
+                  } else {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.COMPLETED);
+                  }

Review comment:
       I feel we can simplify this code and probably move it to a separate function like.
   `updateMove(boolean isInflightReplication, boolean timeout, boolean success, NodeState health)`
   We will also need to check if a.datanode matches our required datanode.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());

Review comment:
       Let's only log the successful cases.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }
+        if (isTargetDnInReplicaSet) {
+          // if the container is in inflightMove and target datanode is
+          // included in the replicas, then swap the source datanode to
+          // first of the replica list if exists, so the source datanode
+          // will be first removed if possible.
+          eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));
+        } else {
+          // a container replica that being moved should not be removed.
+          // if the container is in inflightMove and target datanode is not
+          // included in the replicas, then swap the source datanode to the
+          // last of the replica list, so the source datanode will not
+          // be removed.
+          eligibleReplicas.add(eligibleReplicas.remove(sourceDnPos));

Review comment:
       I am not sure we should handle this. TargetDN would receive all the existing replicas as source to copy from. I think that at end if sourceDN is not present and targetDN is present move is successful.
   

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    currentNodeStat = nodeManager.getNodeStatus(targetDn);
+    healthStat = currentNodeStat.getHealth();
+    operationalState = currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;

Review comment:
       DELETION_FAIL_NODE_UNHEALTHY can be used here.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    currentNodeStat = nodeManager.getNodeStatus(targetDn);
+    healthStat = currentNodeStat.getHealth();
+    operationalState = currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    // we need to synchronize on ContainerInfo, since it is
+    // shared by ICR/FCR handler and this.processContainer
+    // TODO: use a Read lock after introducing a RW lock into ContainerInfo
+    ContainerInfo cif = containerManager.getContainer(cid);
+    synchronized (cif) {
+      final Set<DatanodeDetails> replicas = containerManager
+            .getContainerReplicas(cid).stream()
+            .map(ContainerReplica::getDatanodeDetails)
+            .collect(Collectors.toSet());
+      if (replicas.contains(targetDn)) {
+        LOG.info("given container exists in the target Datanode");
+        return ret;
+      }
+      if (!replicas.contains(srcDn)) {
+        LOG.info("given container does not exist in the source Datanode");
+        return ret;
+      }
+
+      /*
+      * the reason why the given container should not be taking any inflight
+      * action is that: if the given container is being replicated or deleted,
+      * the num of its replica is not deterministic, so move operation issued
+      * by balancer may cause a nondeterministic result, so we should drop
+      * this option for this time.
+      * */
+
+      if (inflightReplication.containsKey(cid)) {
+        LOG.info("given container is in inflight replication");
+        return ret;
+      }
+      if (inflightDeletion.containsKey(cid)) {
+        LOG.info("given container is in inflight deletion");
+        return ret;
+      }
+
+      /*
+      * here, no need to see whether cid is in inflightMove, because
+      * these three map are all synchronized on ContainerInfo, if cid
+      * is in infligtMove , it must now being replicated or deleted,
+      * so it must be in inflightReplication or in infligthDeletion.
+      * thus, if we can not find cid in both of them , this cid must
+      * not be in inflightMove.
+      */
+

Review comment:
       We should also check that {Existing replicas + Target_Dn - Source_Dn} satisfies the placement policy.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }
+        if (isTargetDnInReplicaSet) {
+          // if the container is in inflightMove and target datanode is
+          // included in the replicas, then swap the source datanode to
+          // first of the replica list if exists, so the source datanode
+          // will be first removed if possible.
+          eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));

Review comment:
       We should also handle the case when sourceDN is not present in eligibleReplicas.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r660238109



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }

Review comment:
       i get your point here, but sourceDN is a type of DatanodeDetails and eligibleReplicas is a list of type ContainerReplica, so i am not sure `eligibleReplicas.remove(sourceDN)` can work as expected




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r671162732



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1169,78 +1173,61 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
         }
       }
       eligibleReplicas.removeAll(unhealthyReplicas);
-
-      excess -= handleMoveIfNeeded(container, eligibleReplicas);
-
       removeExcessReplicasIfNeeded(excess, container, eligibleReplicas);
     }
   }
 
   /**
-   * if the container is in inflightMove, handle move if needed.
+   * if the container is in inflightMove, handle move.
    *
    * @param cif ContainerInfo
-   * @param eligibleReplicas An list of replicas, which may have excess replicas
-   * @return minus how many replica is removed through sending delete command
+   * @param replicaSet An Set of replicas, which may have excess replicas
    */
-  private int handleMoveIfNeeded(final ContainerInfo cif,
-                   final List<ContainerReplica> eligibleReplicas) {
-    int minus = 0;
+  private void handleMove(final ContainerInfo cif,

Review comment:
       yes , i will do this , thanks for the review




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-877136121


   @JacksonYao287 There is one optimization which we can use for move.
    https://github.com/apache/ozone/pull/2349/files#diff-7b38a0477ad3f17c66925825066d4f153b69efbf37288b2c71cd573313771122R1265-R1270
   
   If placement policy was initially not satisfied then deletion for source can be sent provided placement count remains same after deletion. I think it would be a good idea to abstract out logic here 
   https://github.com/apache/ozone/pull/2349/files#diff-7b38a0477ad3f17c66925825066d4f153b69efbf37288b2c71cd573313771122R1265-R1270
   and reuse it for move and over-replication.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-877131817


   @JacksonYao287 Thanks for updating the PR! The logic looks good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r672818235



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -861,47 +1168,145 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
-            break;
-          }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
-          }
-          // If we decided not to remove this replica, put it back into the set
-          eligibleSet.add(r);
+      eligibleReplicas.removeAll(unhealthyReplicas);
+
+      excess -= handleMoveIfNeeded(container, eligibleReplicas);
+
+      removeExcessReplicasIfNeeded(excess, container, eligibleReplicas);
+    }
+  }
+
+  /**
+   * if the container is in inflightMove, handle move if needed.
+   *
+   * @param cif ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   * @return minus how many replica is removed through sending delete command
+   */
+  private int handleMoveIfNeeded(final ContainerInfo cif,
+                   final List<ContainerReplica> eligibleReplicas) {
+    int minus = 0;
+    final ContainerID cid = cif.containerID();
+    if (!inflightMove.containsKey(cid)) {
+      return minus;
+    }
+
+    Pair<DatanodeDetails, DatanodeDetails> movePair =
+        inflightMove.get(cid);
+
+    final DatanodeDetails srcDn = movePair.getKey();
+    final DatanodeDetails targetDn = movePair.getValue();
+    boolean isSourceDnInReplicaSet;
+    boolean isTargetDnInReplicaSet;
+
+    isSourceDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(srcDn));
+    isTargetDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(targetDn));
+
+    // if target datanode is not in replica set , nothing to do
+    if (isTargetDnInReplicaSet) {
+      Set<ContainerReplica> eligibleReplicaSet =
+          eligibleReplicas.stream().collect(Collectors.toSet());
+      int replicationFactor =
+          cif.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus currentCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+      eligibleReplicaSet.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+      ContainerPlacementStatus afterMoveCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+
+      if (isSourceDnInReplicaSet &&
+          isPlacementStatusActuallyEqual(currentCPS, afterMoveCPS)) {
+        sendDeleteCommand(cif, srcDn, true);
+        eligibleReplicas.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+        minus++;
+      } else {
+        if (!isSourceDnInReplicaSet) {
+          // if the target is present, and source disappears somehow,
+          // we can consider move is successful.
+          inflightMoveFuture.get(cid).complete(MoveResult.COMPLETED);
+        } else {
+          // if source and target datanode are both in the replicaset,
+          // but we can not delete source datanode for now (e.g.,
+          // there is only 3 replicas or not policy-statisfied , etc.),
+          // we just complete the future without sending a delete command.
+          LOG.info("can not remove source replica after successfully " +
+              "replicated to target datanode");
+          inflightMoveFuture.get(cid).complete(MoveResult.DELETE_FAIL_POLICY);
         }
-        if (excess > 0) {
-          LOG.info("The container {} is over replicated with {} excess " +
-              "replica. The excess replicas cannot be removed without " +
-              "violating the placement policy", container, excess);
+        inflightMove.remove(cid);
+        inflightMoveFuture.remove(cid);
+      }
+    }
+
+    return minus;
+  }
+
+  /**
+   * remove execess replicas if needed, replicationFactor and placement policy
+   * will be take into consideration.
+   *
+   * @param excess the excess number after subtracting replicationFactor
+   * @param container ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   */
+  private void removeExcessReplicasIfNeeded(int excess,
+                    final ContainerInfo container,
+                    final List<ContainerReplica> eligibleReplicas) {
+    // After removing all unhealthy replicas, if the container is still over
+    // replicated then we need to check if it is already mis-replicated.
+    // If it is, we do no harm by removing excess replicas. However, if it is
+    // not mis-replicated, then we can only remove replicas if they don't
+    // make the container become mis-replicated.
+    if (excess > 0) {
+      Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
+      final int replicationFactor =
+          container.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus ps =
+          getPlacementStatus(eligibleSet, replicationFactor);
+
+      for (ContainerReplica r : eligibleReplicas) {
+        if (excess <= 0) {
+          break;
         }
+        // First remove the replica we are working on from the set, and then
+        // check if the set is now mis-replicated.
+        eligibleSet.remove(r);
+        ContainerPlacementStatus nowPS =
+            getPlacementStatus(eligibleSet, replicationFactor);
+        if (isPlacementStatusActuallyEqual(ps, nowPS)) {
+          // Remove the replica if the container was already unsatisfied
+          // and losing this replica keep actual placement count unchanged.
+          // OR if losing this replica still keep satisfied
+          sendDeleteCommand(container, r.getDatanodeDetails(), true);
+          excess -= 1;
+          continue;
+        }
+        // If we decided not to remove this replica, put it back into the set
+        eligibleSet.add(r);
+      }
+      if (excess > 0) {
+        LOG.info("The container {} is over replicated with {} excess " +
+            "replica. The excess replicas cannot be removed without " +
+            "violating the placement policy", container, excess);
       }
     }
   }
 
+  /**
+   * whether the given two ContainerPlacementStatus are actually equal.
+   *
+   * @param cps1 ContainerPlacementStatus
+   * @param cps2 ContainerPlacementStatus
+   */
+  private boolean isPlacementStatusActuallyEqual(
+                      ContainerPlacementStatus cps1,
+                      ContainerPlacementStatus cps2) {
+    return cps1.actualPlacementCount() == cps2.actualPlacementCount() ||
+        cps1.isPolicySatisfied() && cps2.isPolicySatisfied();

Review comment:
       Addressed in #2437.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] ChenSammi merged pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
ChenSammi merged pull request #2349:
URL: https://github.com/apache/ozone/pull/2349


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 edited a comment on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 edited a comment on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-877924114


   > If placement policy was initially not satisfied then deletion for source can be sent provided placement count remains same after deletion. I think it would be a good idea to abstract out logic here and reuse it for move and over-replication.
    
   yea, looks good, i will make this change and  add some unit tests for move.
   
   by the way , i think `!ps.isPolicySatisfied()` before `nowPS.actualPlacementCount() == ps.actualPlacementCount()` is useless, we can remove it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r659435202



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -458,6 +505,37 @@ private void updateInflightAction(final ContainerInfo container,
           if (health != NodeState.HEALTHY || a.time < deadline
               || filter.test(a)) {
             iter.remove();
+
+            if (inflightMove.containsKey(id)) {
+              boolean isInflightReplication =
+                  inflightActions.equals(inflightReplication);
+              //if replication is completed , nothing to do
+              if (!(isInflightReplication && filter.test(a))) {
+                if (isInflightReplication) {
+                  if (health != NodeState.HEALTHY) {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.REPLICATION_FAIL_NODE_UNHEALTHY);
+                  } else {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.REPLICATION_FAIL_TIME_OUT);
+                  }
+                } else {
+                  if (health != NodeState.HEALTHY) {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.DELETION_FAIL_NODE_UNHEALTHY);
+                  } else if (a.time < deadline) {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.DELETION_FAIL_TIME_OUT);
+                  } else {
+                    inflightMoveFuture.get(id).complete(
+                        MoveResult.COMPLETED);
+                  }

Review comment:
       thanks for pointing out this,i will make a change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r664264191



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,43 +1153,128 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
             break;
           }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
+        }
+        if (isTargetDnInReplicaSet) {
+          if (isSourceDnInReplicaSet) {
+            // if the target is present, and source disappears somehow,
+            // we can consider move is successful.
+            inflightMoveFuture.get(id).complete(MoveResult.COMPLETED);
+            inflightMove.remove(id);
+            inflightMoveFuture.remove(id);
+          } else {
+            // if the container is in inflightMove and target datanode is
+            // included in the replicas, then swap the source datanode to
+            // first of the replica list if exists, so the source datanode
+            // will be first removed if possible.
+            eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));
+          }

Review comment:
       yea, thanks @siddhantsangwan  for pointing out this!  I explained the logic in the comment, which is just what you said,  but i made a mistake in the code. i will fix this , thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r660250722



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }
+        if (isTargetDnInReplicaSet) {
+          // if the container is in inflightMove and target datanode is
+          // included in the replicas, then swap the source datanode to
+          // first of the replica list if exists, so the source datanode
+          // will be first removed if possible.
+          eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));

Review comment:
       yes, will handle this 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-870174091


   Thanks @lokeshj1703 very much for this review, i will update this PR according to your comments.
   
   >I was also thinking if we could simplify the logic in RM#handleOverReplicatedContainer. How about we always try to make the container over-replicated so that move can succeed. I think in handleOverReplicatedContainer if src and target are both present and container is over-replicated, delete can be sent for source.
   
   this idea seems good, but i have a little question. i think even if src and target are both present and container is over-replicated , deletion can not be definitely sent for source. if the placementPolicy is not met after src is deleted, i think we can not directly send a delete command to src. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-877924114


   > If placement policy was initially not satisfied then deletion for source can be sent provided placement count remains same after deletion. I think it would be a good idea to abstract out logic here and reuse it for move and over-replication.
    
   yea, looks good, i will do this change and  add some unit test for this.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r660238109



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }

Review comment:
       i get your point here, but sourceDN is a type of DatanodeDetails and eligibleReplicas is a list of type ContainerReplica, so i am not sure `eligibleReplicas.remove(sourceDN)` can work as expected

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }
+        if (isTargetDnInReplicaSet) {
+          // if the container is in inflightMove and target datanode is
+          // included in the replicas, then swap the source datanode to
+          // first of the replica list if exists, so the source datanode
+          // will be first removed if possible.
+          eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));
+        } else {
+          // a container replica that being moved should not be removed.
+          // if the container is in inflightMove and target datanode is not
+          // included in the replicas, then swap the source datanode to the
+          // last of the replica list, so the source datanode will not
+          // be removed.
+          eligibleReplicas.add(eligibleReplicas.remove(sourceDnPos));

Review comment:
       > TargetDN would receive all the existing replicas as source to copy from
   
   in move option, we maintain src and target in inflightMove and track the replication from source to target. if we take other the existing replicas as the source, Although the final result seems same(sourceDN is not present and targetDN is present), it seems a little confused in Semantics. i think it will better off making sure the source not be deleted until the target is present.
   
   > I think that at end if sourceDN is not present and targetDN is present move is successful.
   
   yes , it make sense. if the target is present, and source disappear somehow, we can consider move is successful.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }
+        if (isTargetDnInReplicaSet) {
+          // if the container is in inflightMove and target datanode is
+          // included in the replicas, then swap the source datanode to
+          // first of the replica list if exists, so the source datanode
+          // will be first removed if possible.
+          eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));

Review comment:
       yes, will handle this 

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    currentNodeStat = nodeManager.getNodeStatus(targetDn);
+    healthStat = currentNodeStat.getHealth();
+    operationalState = currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    // we need to synchronize on ContainerInfo, since it is
+    // shared by ICR/FCR handler and this.processContainer
+    // TODO: use a Read lock after introducing a RW lock into ContainerInfo
+    ContainerInfo cif = containerManager.getContainer(cid);
+    synchronized (cif) {
+      final Set<DatanodeDetails> replicas = containerManager
+            .getContainerReplicas(cid).stream()
+            .map(ContainerReplica::getDatanodeDetails)
+            .collect(Collectors.toSet());
+      if (replicas.contains(targetDn)) {
+        LOG.info("given container exists in the target Datanode");
+        return ret;
+      }
+      if (!replicas.contains(srcDn)) {
+        LOG.info("given container does not exist in the source Datanode");
+        return ret;
+      }
+
+      /*
+      * the reason why the given container should not be taking any inflight
+      * action is that: if the given container is being replicated or deleted,
+      * the num of its replica is not deterministic, so move operation issued
+      * by balancer may cause a nondeterministic result, so we should drop
+      * this option for this time.
+      * */
+
+      if (inflightReplication.containsKey(cid)) {
+        LOG.info("given container is in inflight replication");
+        return ret;
+      }
+      if (inflightDeletion.containsKey(cid)) {
+        LOG.info("given container is in inflight deletion");
+        return ret;
+      }
+
+      /*
+      * here, no need to see whether cid is in inflightMove, because
+      * these three map are all synchronized on ContainerInfo, if cid
+      * is in infligtMove , it must now being replicated or deleted,
+      * so it must be in inflightReplication or in infligthDeletion.
+      * thus, if we can not find cid in both of them , this cid must
+      * not be in inflightMove.
+      */
+

Review comment:
       to be simple, I take a conservative strategy here : replication can always be executed, but the execution of deletion always depends on placement policy, i think this do no harm to data safety. the placement policy check is delayed to `handleOverReplicatedContainer`

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,

Review comment:
       it seem good, i think completed exceptionally is a better choice. i will do this

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,

Review comment:
       it seem good, i will  add more MoveResult for the errors




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r672818235



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -861,47 +1168,145 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
-            break;
-          }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
-          }
-          // If we decided not to remove this replica, put it back into the set
-          eligibleSet.add(r);
+      eligibleReplicas.removeAll(unhealthyReplicas);
+
+      excess -= handleMoveIfNeeded(container, eligibleReplicas);
+
+      removeExcessReplicasIfNeeded(excess, container, eligibleReplicas);
+    }
+  }
+
+  /**
+   * if the container is in inflightMove, handle move if needed.
+   *
+   * @param cif ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   * @return minus how many replica is removed through sending delete command
+   */
+  private int handleMoveIfNeeded(final ContainerInfo cif,
+                   final List<ContainerReplica> eligibleReplicas) {
+    int minus = 0;
+    final ContainerID cid = cif.containerID();
+    if (!inflightMove.containsKey(cid)) {
+      return minus;
+    }
+
+    Pair<DatanodeDetails, DatanodeDetails> movePair =
+        inflightMove.get(cid);
+
+    final DatanodeDetails srcDn = movePair.getKey();
+    final DatanodeDetails targetDn = movePair.getValue();
+    boolean isSourceDnInReplicaSet;
+    boolean isTargetDnInReplicaSet;
+
+    isSourceDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(srcDn));
+    isTargetDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(targetDn));
+
+    // if target datanode is not in replica set , nothing to do
+    if (isTargetDnInReplicaSet) {
+      Set<ContainerReplica> eligibleReplicaSet =
+          eligibleReplicas.stream().collect(Collectors.toSet());
+      int replicationFactor =
+          cif.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus currentCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+      eligibleReplicaSet.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+      ContainerPlacementStatus afterMoveCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+
+      if (isSourceDnInReplicaSet &&
+          isPlacementStatusActuallyEqual(currentCPS, afterMoveCPS)) {
+        sendDeleteCommand(cif, srcDn, true);
+        eligibleReplicas.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+        minus++;
+      } else {
+        if (!isSourceDnInReplicaSet) {
+          // if the target is present, and source disappears somehow,
+          // we can consider move is successful.
+          inflightMoveFuture.get(cid).complete(MoveResult.COMPLETED);
+        } else {
+          // if source and target datanode are both in the replicaset,
+          // but we can not delete source datanode for now (e.g.,
+          // there is only 3 replicas or not policy-statisfied , etc.),
+          // we just complete the future without sending a delete command.
+          LOG.info("can not remove source replica after successfully " +
+              "replicated to target datanode");
+          inflightMoveFuture.get(cid).complete(MoveResult.DELETE_FAIL_POLICY);
         }
-        if (excess > 0) {
-          LOG.info("The container {} is over replicated with {} excess " +
-              "replica. The excess replicas cannot be removed without " +
-              "violating the placement policy", container, excess);
+        inflightMove.remove(cid);
+        inflightMoveFuture.remove(cid);
+      }
+    }
+
+    return minus;
+  }
+
+  /**
+   * remove execess replicas if needed, replicationFactor and placement policy
+   * will be take into consideration.
+   *
+   * @param excess the excess number after subtracting replicationFactor
+   * @param container ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   */
+  private void removeExcessReplicasIfNeeded(int excess,
+                    final ContainerInfo container,
+                    final List<ContainerReplica> eligibleReplicas) {
+    // After removing all unhealthy replicas, if the container is still over
+    // replicated then we need to check if it is already mis-replicated.
+    // If it is, we do no harm by removing excess replicas. However, if it is
+    // not mis-replicated, then we can only remove replicas if they don't
+    // make the container become mis-replicated.
+    if (excess > 0) {
+      Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
+      final int replicationFactor =
+          container.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus ps =
+          getPlacementStatus(eligibleSet, replicationFactor);
+
+      for (ContainerReplica r : eligibleReplicas) {
+        if (excess <= 0) {
+          break;
         }
+        // First remove the replica we are working on from the set, and then
+        // check if the set is now mis-replicated.
+        eligibleSet.remove(r);
+        ContainerPlacementStatus nowPS =
+            getPlacementStatus(eligibleSet, replicationFactor);
+        if (isPlacementStatusActuallyEqual(ps, nowPS)) {
+          // Remove the replica if the container was already unsatisfied
+          // and losing this replica keep actual placement count unchanged.
+          // OR if losing this replica still keep satisfied
+          sendDeleteCommand(container, r.getDatanodeDetails(), true);
+          excess -= 1;
+          continue;
+        }
+        // If we decided not to remove this replica, put it back into the set
+        eligibleSet.add(r);
+      }
+      if (excess > 0) {
+        LOG.info("The container {} is over replicated with {} excess " +
+            "replica. The excess replicas cannot be removed without " +
+            "violating the placement policy", container, excess);
       }
     }
   }
 
+  /**
+   * whether the given two ContainerPlacementStatus are actually equal.
+   *
+   * @param cps1 ContainerPlacementStatus
+   * @param cps2 ContainerPlacementStatus
+   */
+  private boolean isPlacementStatusActuallyEqual(
+                      ContainerPlacementStatus cps1,
+                      ContainerPlacementStatus cps2) {
+    return cps1.actualPlacementCount() == cps2.actualPlacementCount() ||
+        cps1.isPolicySatisfied() && cps2.isPolicySatisfied();

Review comment:
       Addressed in #2437.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 edited a comment on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 edited a comment on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-877924114


   > If placement policy was initially not satisfied then deletion for source can be sent provided placement count remains same after deletion. I think it would be a good idea to abstract out logic here and reuse it for move and over-replication.
    
   yea, looks good, i will make this change and  add some unit tests for move.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-874095030


   i have update the PR ,  @lokeshj1703  PATL!  if there is no more logic problem , i will add unit test for move


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r670325863



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -861,47 +1168,145 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
-            break;
-          }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
-          }
-          // If we decided not to remove this replica, put it back into the set
-          eligibleSet.add(r);
+      eligibleReplicas.removeAll(unhealthyReplicas);
+
+      excess -= handleMoveIfNeeded(container, eligibleReplicas);
+
+      removeExcessReplicasIfNeeded(excess, container, eligibleReplicas);
+    }
+  }
+
+  /**
+   * if the container is in inflightMove, handle move if needed.
+   *
+   * @param cif ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   * @return minus how many replica is removed through sending delete command
+   */
+  private int handleMoveIfNeeded(final ContainerInfo cif,
+                   final List<ContainerReplica> eligibleReplicas) {
+    int minus = 0;
+    final ContainerID cid = cif.containerID();
+    if (!inflightMove.containsKey(cid)) {
+      return minus;
+    }
+
+    Pair<DatanodeDetails, DatanodeDetails> movePair =
+        inflightMove.get(cid);
+
+    final DatanodeDetails srcDn = movePair.getKey();
+    final DatanodeDetails targetDn = movePair.getValue();
+    boolean isSourceDnInReplicaSet;
+    boolean isTargetDnInReplicaSet;
+
+    isSourceDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(srcDn));
+    isTargetDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(targetDn));
+
+    // if target datanode is not in replica set , nothing to do
+    if (isTargetDnInReplicaSet) {
+      Set<ContainerReplica> eligibleReplicaSet =
+          eligibleReplicas.stream().collect(Collectors.toSet());
+      int replicationFactor =
+          cif.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus currentCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+      eligibleReplicaSet.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+      ContainerPlacementStatus afterMoveCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+
+      if (isSourceDnInReplicaSet &&
+          isPlacementStatusActuallyEqual(currentCPS, afterMoveCPS)) {
+        sendDeleteCommand(cif, srcDn, true);
+        eligibleReplicas.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+        minus++;
+      } else {
+        if (!isSourceDnInReplicaSet) {
+          // if the target is present, and source disappears somehow,
+          // we can consider move is successful.
+          inflightMoveFuture.get(cid).complete(MoveResult.COMPLETED);
+        } else {
+          // if source and target datanode are both in the replicaset,
+          // but we can not delete source datanode for now (e.g.,
+          // there is only 3 replicas or not policy-statisfied , etc.),
+          // we just complete the future without sending a delete command.
+          LOG.info("can not remove source replica after successfully " +
+              "replicated to target datanode");
+          inflightMoveFuture.get(cid).complete(MoveResult.DELETE_FAIL_POLICY);
         }
-        if (excess > 0) {
-          LOG.info("The container {} is over replicated with {} excess " +
-              "replica. The excess replicas cannot be removed without " +
-              "violating the placement policy", container, excess);
+        inflightMove.remove(cid);
+        inflightMoveFuture.remove(cid);
+      }
+    }
+
+    return minus;
+  }
+
+  /**
+   * remove execess replicas if needed, replicationFactor and placement policy
+   * will be take into consideration.
+   *
+   * @param excess the excess number after subtracting replicationFactor
+   * @param container ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   */
+  private void removeExcessReplicasIfNeeded(int excess,
+                    final ContainerInfo container,
+                    final List<ContainerReplica> eligibleReplicas) {
+    // After removing all unhealthy replicas, if the container is still over
+    // replicated then we need to check if it is already mis-replicated.
+    // If it is, we do no harm by removing excess replicas. However, if it is
+    // not mis-replicated, then we can only remove replicas if they don't
+    // make the container become mis-replicated.
+    if (excess > 0) {
+      Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
+      final int replicationFactor =
+          container.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus ps =
+          getPlacementStatus(eligibleSet, replicationFactor);
+
+      for (ContainerReplica r : eligibleReplicas) {
+        if (excess <= 0) {
+          break;
         }
+        // First remove the replica we are working on from the set, and then
+        // check if the set is now mis-replicated.
+        eligibleSet.remove(r);
+        ContainerPlacementStatus nowPS =
+            getPlacementStatus(eligibleSet, replicationFactor);
+        if (isPlacementStatusActuallyEqual(ps, nowPS)) {
+          // Remove the replica if the container was already unsatisfied
+          // and losing this replica keep actual placement count unchanged.
+          // OR if losing this replica still keep satisfied
+          sendDeleteCommand(container, r.getDatanodeDetails(), true);
+          excess -= 1;
+          continue;
+        }
+        // If we decided not to remove this replica, put it back into the set
+        eligibleSet.add(r);
+      }
+      if (excess > 0) {
+        LOG.info("The container {} is over replicated with {} excess " +
+            "replica. The excess replicas cannot be removed without " +
+            "violating the placement policy", container, excess);
       }
     }
   }
 
+  /**
+   * whether the given two ContainerPlacementStatus are actually equal.
+   *
+   * @param cps1 ContainerPlacementStatus
+   * @param cps2 ContainerPlacementStatus
+   */
+  private boolean isPlacementStatusActuallyEqual(
+                      ContainerPlacementStatus cps1,
+                      ContainerPlacementStatus cps2) {
+    return cps1.actualPlacementCount() == cps2.actualPlacementCount() ||
+        cps1.isPolicySatisfied() && cps2.isPolicySatisfied();

Review comment:
       I think the check for `ps.isPolicySatisfied` is important. In case placement policy is not satisfied initially then the count logic can be used. If it was satisfied initially then we must satisfy it after the move.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r671119224



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1169,78 +1173,61 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
         }
       }
       eligibleReplicas.removeAll(unhealthyReplicas);
-
-      excess -= handleMoveIfNeeded(container, eligibleReplicas);
-
       removeExcessReplicasIfNeeded(excess, container, eligibleReplicas);
     }
   }
 
   /**
-   * if the container is in inflightMove, handle move if needed.
+   * if the container is in inflightMove, handle move.
    *
    * @param cif ContainerInfo
-   * @param eligibleReplicas An list of replicas, which may have excess replicas
-   * @return minus how many replica is removed through sending delete command
+   * @param replicaSet An Set of replicas, which may have excess replicas
    */
-  private int handleMoveIfNeeded(final ContainerInfo cif,
-                   final List<ContainerReplica> eligibleReplicas) {
-    int minus = 0;
+  private void handleMove(final ContainerInfo cif,

Review comment:
       This function assumes replication has completed. It would be good to document that.

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1169,78 +1173,61 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
         }
       }
       eligibleReplicas.removeAll(unhealthyReplicas);
-
-      excess -= handleMoveIfNeeded(container, eligibleReplicas);
-
       removeExcessReplicasIfNeeded(excess, container, eligibleReplicas);
     }
   }
 
   /**
-   * if the container is in inflightMove, handle move if needed.
+   * if the container is in inflightMove, handle move.
    *
    * @param cif ContainerInfo
-   * @param eligibleReplicas An list of replicas, which may have excess replicas
-   * @return minus how many replica is removed through sending delete command
+   * @param replicaSet An Set of replicas, which may have excess replicas
    */
-  private int handleMoveIfNeeded(final ContainerInfo cif,
-                   final List<ContainerReplica> eligibleReplicas) {
-    int minus = 0;
+  private void handleMove(final ContainerInfo cif,

Review comment:
       It might be better to rename this function to sth like deleteSrcDnForMove.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r663665876



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    currentNodeStat = nodeManager.getNodeStatus(targetDn);
+    healthStat = currentNodeStat.getHealth();
+    operationalState = currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    // we need to synchronize on ContainerInfo, since it is
+    // shared by ICR/FCR handler and this.processContainer
+    // TODO: use a Read lock after introducing a RW lock into ContainerInfo
+    ContainerInfo cif = containerManager.getContainer(cid);
+    synchronized (cif) {
+      final Set<DatanodeDetails> replicas = containerManager
+            .getContainerReplicas(cid).stream()
+            .map(ContainerReplica::getDatanodeDetails)
+            .collect(Collectors.toSet());
+      if (replicas.contains(targetDn)) {
+        LOG.info("given container exists in the target Datanode");
+        return ret;
+      }
+      if (!replicas.contains(srcDn)) {
+        LOG.info("given container does not exist in the source Datanode");
+        return ret;
+      }
+
+      /*
+      * the reason why the given container should not be taking any inflight
+      * action is that: if the given container is being replicated or deleted,
+      * the num of its replica is not deterministic, so move operation issued
+      * by balancer may cause a nondeterministic result, so we should drop
+      * this option for this time.
+      * */
+
+      if (inflightReplication.containsKey(cid)) {
+        LOG.info("given container is in inflight replication");
+        return ret;
+      }
+      if (inflightDeletion.containsKey(cid)) {
+        LOG.info("given container is in inflight deletion");
+        return ret;
+      }
+
+      /*
+      * here, no need to see whether cid is in inflightMove, because
+      * these three map are all synchronized on ContainerInfo, if cid
+      * is in infligtMove , it must now being replicated or deleted,
+      * so it must be in inflightReplication or in infligthDeletion.
+      * thus, if we can not find cid in both of them , this cid must
+      * not be in inflightMove.
+      */
+

Review comment:
       I see. But I think there can be a case where d1, d2, d3 have the replicas in R1, R2 and R3 racks respectively. If we want to replace d1 with d4 it should be made sure that d4 is from R1 itself. If d4 is from R2 for example, then after replication either d2 or d4 is deleted which is not really helping with move.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 edited a comment on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 edited a comment on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-881107005


   @ChenSammi  PTAL!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-870174091


   Thanks @lokeshj1703 very much for this review, i will update this PR according to your comments.
   
   >I was also thinking if we could simplify the logic in RM#handleOverReplicatedContainer. How about we always try to make the container over-replicated so that move can succeed. I think in handleOverReplicatedContainer if src and target are both present and container is over-replicated, delete can be sent for source.
   
   this idea seems good, but i have a little question. i think even if src and target are both present and container is over-replicated , deletion can not be definitely sent for source. if the placementPolicy is not met after src is deleted, i think we can not directly send a delete command to src. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] siddhantsangwan commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
siddhantsangwan commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r664238672



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,43 +1153,128 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
             break;
           }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
+        }
+        if (isTargetDnInReplicaSet) {
+          if (isSourceDnInReplicaSet) {
+            // if the target is present, and source disappears somehow,
+            // we can consider move is successful.
+            inflightMoveFuture.get(id).complete(MoveResult.COMPLETED);
+            inflightMove.remove(id);
+            inflightMoveFuture.remove(id);
+          } else {
+            // if the container is in inflightMove and target datanode is
+            // included in the replicas, then swap the source datanode to
+            // first of the replica list if exists, so the source datanode
+            // will be first removed if possible.
+            eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));
+          }

Review comment:
       Can you please explain this part? If both target and source are present in replicas, then source should be swapped with the first node in replicas list so it can be deleted, right? And move completes if source is absent while target is present. It seems the logic of if and else parts is opposite.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r672818235



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -861,47 +1168,145 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
-            break;
-          }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
-          }
-          // If we decided not to remove this replica, put it back into the set
-          eligibleSet.add(r);
+      eligibleReplicas.removeAll(unhealthyReplicas);
+
+      excess -= handleMoveIfNeeded(container, eligibleReplicas);
+
+      removeExcessReplicasIfNeeded(excess, container, eligibleReplicas);
+    }
+  }
+
+  /**
+   * if the container is in inflightMove, handle move if needed.
+   *
+   * @param cif ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   * @return minus how many replica is removed through sending delete command
+   */
+  private int handleMoveIfNeeded(final ContainerInfo cif,
+                   final List<ContainerReplica> eligibleReplicas) {
+    int minus = 0;
+    final ContainerID cid = cif.containerID();
+    if (!inflightMove.containsKey(cid)) {
+      return minus;
+    }
+
+    Pair<DatanodeDetails, DatanodeDetails> movePair =
+        inflightMove.get(cid);
+
+    final DatanodeDetails srcDn = movePair.getKey();
+    final DatanodeDetails targetDn = movePair.getValue();
+    boolean isSourceDnInReplicaSet;
+    boolean isTargetDnInReplicaSet;
+
+    isSourceDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(srcDn));
+    isTargetDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(targetDn));
+
+    // if target datanode is not in replica set , nothing to do
+    if (isTargetDnInReplicaSet) {
+      Set<ContainerReplica> eligibleReplicaSet =
+          eligibleReplicas.stream().collect(Collectors.toSet());
+      int replicationFactor =
+          cif.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus currentCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+      eligibleReplicaSet.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+      ContainerPlacementStatus afterMoveCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+
+      if (isSourceDnInReplicaSet &&
+          isPlacementStatusActuallyEqual(currentCPS, afterMoveCPS)) {
+        sendDeleteCommand(cif, srcDn, true);
+        eligibleReplicas.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+        minus++;
+      } else {
+        if (!isSourceDnInReplicaSet) {
+          // if the target is present, and source disappears somehow,
+          // we can consider move is successful.
+          inflightMoveFuture.get(cid).complete(MoveResult.COMPLETED);
+        } else {
+          // if source and target datanode are both in the replicaset,
+          // but we can not delete source datanode for now (e.g.,
+          // there is only 3 replicas or not policy-statisfied , etc.),
+          // we just complete the future without sending a delete command.
+          LOG.info("can not remove source replica after successfully " +
+              "replicated to target datanode");
+          inflightMoveFuture.get(cid).complete(MoveResult.DELETE_FAIL_POLICY);
         }
-        if (excess > 0) {
-          LOG.info("The container {} is over replicated with {} excess " +
-              "replica. The excess replicas cannot be removed without " +
-              "violating the placement policy", container, excess);
+        inflightMove.remove(cid);
+        inflightMoveFuture.remove(cid);
+      }
+    }
+
+    return minus;
+  }
+
+  /**
+   * remove execess replicas if needed, replicationFactor and placement policy
+   * will be take into consideration.
+   *
+   * @param excess the excess number after subtracting replicationFactor
+   * @param container ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   */
+  private void removeExcessReplicasIfNeeded(int excess,
+                    final ContainerInfo container,
+                    final List<ContainerReplica> eligibleReplicas) {
+    // After removing all unhealthy replicas, if the container is still over
+    // replicated then we need to check if it is already mis-replicated.
+    // If it is, we do no harm by removing excess replicas. However, if it is
+    // not mis-replicated, then we can only remove replicas if they don't
+    // make the container become mis-replicated.
+    if (excess > 0) {
+      Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
+      final int replicationFactor =
+          container.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus ps =
+          getPlacementStatus(eligibleSet, replicationFactor);
+
+      for (ContainerReplica r : eligibleReplicas) {
+        if (excess <= 0) {
+          break;
         }
+        // First remove the replica we are working on from the set, and then
+        // check if the set is now mis-replicated.
+        eligibleSet.remove(r);
+        ContainerPlacementStatus nowPS =
+            getPlacementStatus(eligibleSet, replicationFactor);
+        if (isPlacementStatusActuallyEqual(ps, nowPS)) {
+          // Remove the replica if the container was already unsatisfied
+          // and losing this replica keep actual placement count unchanged.
+          // OR if losing this replica still keep satisfied
+          sendDeleteCommand(container, r.getDatanodeDetails(), true);
+          excess -= 1;
+          continue;
+        }
+        // If we decided not to remove this replica, put it back into the set
+        eligibleSet.add(r);
+      }
+      if (excess > 0) {
+        LOG.info("The container {} is over replicated with {} excess " +
+            "replica. The excess replicas cannot be removed without " +
+            "violating the placement policy", container, excess);
       }
     }
   }
 
+  /**
+   * whether the given two ContainerPlacementStatus are actually equal.
+   *
+   * @param cps1 ContainerPlacementStatus
+   * @param cps2 ContainerPlacementStatus
+   */
+  private boolean isPlacementStatusActuallyEqual(
+                      ContainerPlacementStatus cps1,
+                      ContainerPlacementStatus cps2) {
+    return cps1.actualPlacementCount() == cps2.actualPlacementCount() ||
+        cps1.isPolicySatisfied() && cps2.isPolicySatisfied();

Review comment:
       Addressed in #2437.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r663914240



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    currentNodeStat = nodeManager.getNodeStatus(targetDn);
+    healthStat = currentNodeStat.getHealth();
+    operationalState = currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    // we need to synchronize on ContainerInfo, since it is
+    // shared by ICR/FCR handler and this.processContainer
+    // TODO: use a Read lock after introducing a RW lock into ContainerInfo
+    ContainerInfo cif = containerManager.getContainer(cid);
+    synchronized (cif) {
+      final Set<DatanodeDetails> replicas = containerManager
+            .getContainerReplicas(cid).stream()
+            .map(ContainerReplica::getDatanodeDetails)
+            .collect(Collectors.toSet());
+      if (replicas.contains(targetDn)) {
+        LOG.info("given container exists in the target Datanode");
+        return ret;
+      }
+      if (!replicas.contains(srcDn)) {
+        LOG.info("given container does not exist in the source Datanode");
+        return ret;
+      }
+
+      /*
+      * the reason why the given container should not be taking any inflight
+      * action is that: if the given container is being replicated or deleted,
+      * the num of its replica is not deterministic, so move operation issued
+      * by balancer may cause a nondeterministic result, so we should drop
+      * this option for this time.
+      * */
+
+      if (inflightReplication.containsKey(cid)) {
+        LOG.info("given container is in inflight replication");
+        return ret;
+      }
+      if (inflightDeletion.containsKey(cid)) {
+        LOG.info("given container is in inflight deletion");
+        return ret;
+      }
+
+      /*
+      * here, no need to see whether cid is in inflightMove, because
+      * these three map are all synchronized on ContainerInfo, if cid
+      * is in infligtMove , it must now being replicated or deleted,
+      * so it must be in inflightReplication or in infligthDeletion.
+      * thus, if we can not find cid in both of them , this cid must
+      * not be in inflightMove.
+      */
+

Review comment:
       yea, it makes sense, i will change this

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,43 +1153,128 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
             break;
           }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
+        }
+        if (isTargetDnInReplicaSet) {
+          if (isSourceDnInReplicaSet) {
+            // if the target is present, and source disappears somehow,
+            // we can consider move is successful.
+            inflightMoveFuture.get(id).complete(MoveResult.COMPLETED);
+            inflightMove.remove(id);
+            inflightMoveFuture.remove(id);
+          } else {
+            // if the container is in inflightMove and target datanode is
+            // included in the replicas, then swap the source datanode to
+            // first of the replica list if exists, so the source datanode
+            // will be first removed if possible.
+            eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));
+          }

Review comment:
       yea, thanks @siddhantsangwan  for pointing out this!  I explained the logic in the comment, which is just what you said,  but i made a mistake in the code. i will fix this , thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r670899224



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -861,47 +1168,145 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
-      // After removing all unhealthy replicas, if the container is still over
-      // replicated then we need to check if it is already mis-replicated.
-      // If it is, we do no harm by removing excess replicas. However, if it is
-      // not mis-replicated, then we can only remove replicas if they don't
-      // make the container become mis-replicated.
-      if (excess > 0) {
-        eligibleReplicas.removeAll(unhealthyReplicas);
-        Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
-        ContainerPlacementStatus ps =
-            getPlacementStatus(eligibleSet, replicationFactor);
-        for (ContainerReplica r : eligibleReplicas) {
-          if (excess <= 0) {
-            break;
-          }
-          // First remove the replica we are working on from the set, and then
-          // check if the set is now mis-replicated.
-          eligibleSet.remove(r);
-          ContainerPlacementStatus nowPS =
-              getPlacementStatus(eligibleSet, replicationFactor);
-          if ((!ps.isPolicySatisfied()
-                && nowPS.actualPlacementCount() == ps.actualPlacementCount())
-              || (ps.isPolicySatisfied() && nowPS.isPolicySatisfied())) {
-            // Remove the replica if the container was already unsatisfied
-            // and losing this replica keep actual placement count unchanged.
-            // OR if losing this replica still keep satisfied
-            sendDeleteCommand(container, r.getDatanodeDetails(), true);
-            excess -= 1;
-            continue;
-          }
-          // If we decided not to remove this replica, put it back into the set
-          eligibleSet.add(r);
+      eligibleReplicas.removeAll(unhealthyReplicas);
+
+      excess -= handleMoveIfNeeded(container, eligibleReplicas);
+
+      removeExcessReplicasIfNeeded(excess, container, eligibleReplicas);
+    }
+  }
+
+  /**
+   * if the container is in inflightMove, handle move if needed.
+   *
+   * @param cif ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   * @return minus how many replica is removed through sending delete command
+   */
+  private int handleMoveIfNeeded(final ContainerInfo cif,
+                   final List<ContainerReplica> eligibleReplicas) {
+    int minus = 0;
+    final ContainerID cid = cif.containerID();
+    if (!inflightMove.containsKey(cid)) {
+      return minus;
+    }
+
+    Pair<DatanodeDetails, DatanodeDetails> movePair =
+        inflightMove.get(cid);
+
+    final DatanodeDetails srcDn = movePair.getKey();
+    final DatanodeDetails targetDn = movePair.getValue();
+    boolean isSourceDnInReplicaSet;
+    boolean isTargetDnInReplicaSet;
+
+    isSourceDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(srcDn));
+    isTargetDnInReplicaSet = eligibleReplicas.stream()
+        .anyMatch(r -> r.getDatanodeDetails().equals(targetDn));
+
+    // if target datanode is not in replica set , nothing to do
+    if (isTargetDnInReplicaSet) {
+      Set<ContainerReplica> eligibleReplicaSet =
+          eligibleReplicas.stream().collect(Collectors.toSet());
+      int replicationFactor =
+          cif.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus currentCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+      eligibleReplicaSet.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+      ContainerPlacementStatus afterMoveCPS =
+          getPlacementStatus(eligibleReplicaSet, replicationFactor);
+
+      if (isSourceDnInReplicaSet &&
+          isPlacementStatusActuallyEqual(currentCPS, afterMoveCPS)) {
+        sendDeleteCommand(cif, srcDn, true);
+        eligibleReplicas.removeIf(r -> r.getDatanodeDetails().equals(srcDn));
+        minus++;
+      } else {
+        if (!isSourceDnInReplicaSet) {
+          // if the target is present, and source disappears somehow,
+          // we can consider move is successful.
+          inflightMoveFuture.get(cid).complete(MoveResult.COMPLETED);
+        } else {
+          // if source and target datanode are both in the replicaset,
+          // but we can not delete source datanode for now (e.g.,
+          // there is only 3 replicas or not policy-statisfied , etc.),
+          // we just complete the future without sending a delete command.
+          LOG.info("can not remove source replica after successfully " +
+              "replicated to target datanode");
+          inflightMoveFuture.get(cid).complete(MoveResult.DELETE_FAIL_POLICY);
         }
-        if (excess > 0) {
-          LOG.info("The container {} is over replicated with {} excess " +
-              "replica. The excess replicas cannot be removed without " +
-              "violating the placement policy", container, excess);
+        inflightMove.remove(cid);
+        inflightMoveFuture.remove(cid);
+      }
+    }
+
+    return minus;
+  }
+
+  /**
+   * remove execess replicas if needed, replicationFactor and placement policy
+   * will be take into consideration.
+   *
+   * @param excess the excess number after subtracting replicationFactor
+   * @param container ContainerInfo
+   * @param eligibleReplicas An list of replicas, which may have excess replicas
+   */
+  private void removeExcessReplicasIfNeeded(int excess,
+                    final ContainerInfo container,
+                    final List<ContainerReplica> eligibleReplicas) {
+    // After removing all unhealthy replicas, if the container is still over
+    // replicated then we need to check if it is already mis-replicated.
+    // If it is, we do no harm by removing excess replicas. However, if it is
+    // not mis-replicated, then we can only remove replicas if they don't
+    // make the container become mis-replicated.
+    if (excess > 0) {
+      Set<ContainerReplica> eligibleSet = new HashSet<>(eligibleReplicas);
+      final int replicationFactor =
+          container.getReplicationConfig().getRequiredNodes();
+      ContainerPlacementStatus ps =
+          getPlacementStatus(eligibleSet, replicationFactor);
+
+      for (ContainerReplica r : eligibleReplicas) {
+        if (excess <= 0) {
+          break;
         }
+        // First remove the replica we are working on from the set, and then
+        // check if the set is now mis-replicated.
+        eligibleSet.remove(r);
+        ContainerPlacementStatus nowPS =
+            getPlacementStatus(eligibleSet, replicationFactor);
+        if (isPlacementStatusActuallyEqual(ps, nowPS)) {
+          // Remove the replica if the container was already unsatisfied
+          // and losing this replica keep actual placement count unchanged.
+          // OR if losing this replica still keep satisfied
+          sendDeleteCommand(container, r.getDatanodeDetails(), true);
+          excess -= 1;
+          continue;
+        }
+        // If we decided not to remove this replica, put it back into the set
+        eligibleSet.add(r);
+      }
+      if (excess > 0) {
+        LOG.info("The container {} is over replicated with {} excess " +
+            "replica. The excess replicas cannot be removed without " +
+            "violating the placement policy", container, excess);
       }
     }
   }
 
+  /**
+   * whether the given two ContainerPlacementStatus are actually equal.
+   *
+   * @param cps1 ContainerPlacementStatus
+   * @param cps2 ContainerPlacementStatus
+   */
+  private boolean isPlacementStatusActuallyEqual(
+                      ContainerPlacementStatus cps1,
+                      ContainerPlacementStatus cps2) {
+    return cps1.actualPlacementCount() == cps2.actualPlacementCount() ||
+        cps1.isPolicySatisfied() && cps2.isPolicySatisfied();

Review comment:
       yea, make sense , i will add it back




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-881107005


   @ChenSammi  PATL!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#issuecomment-874095030


   i have update the PR ,  @lokeshj1703  PATL!  if there is no more logic problem , i will add unit test for move


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r660256886



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    currentNodeStat = nodeManager.getNodeStatus(targetDn);
+    healthStat = currentNodeStat.getHealth();
+    operationalState = currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    // we need to synchronize on ContainerInfo, since it is
+    // shared by ICR/FCR handler and this.processContainer
+    // TODO: use a Read lock after introducing a RW lock into ContainerInfo
+    ContainerInfo cif = containerManager.getContainer(cid);
+    synchronized (cif) {
+      final Set<DatanodeDetails> replicas = containerManager
+            .getContainerReplicas(cid).stream()
+            .map(ContainerReplica::getDatanodeDetails)
+            .collect(Collectors.toSet());
+      if (replicas.contains(targetDn)) {
+        LOG.info("given container exists in the target Datanode");
+        return ret;
+      }
+      if (!replicas.contains(srcDn)) {
+        LOG.info("given container does not exist in the source Datanode");
+        return ret;
+      }
+
+      /*
+      * the reason why the given container should not be taking any inflight
+      * action is that: if the given container is being replicated or deleted,
+      * the num of its replica is not deterministic, so move operation issued
+      * by balancer may cause a nondeterministic result, so we should drop
+      * this option for this time.
+      * */
+
+      if (inflightReplication.containsKey(cid)) {
+        LOG.info("given container is in inflight replication");
+        return ret;
+      }
+      if (inflightDeletion.containsKey(cid)) {
+        LOG.info("given container is in inflight deletion");
+        return ret;
+      }
+
+      /*
+      * here, no need to see whether cid is in inflightMove, because
+      * these three map are all synchronized on ContainerInfo, if cid
+      * is in infligtMove , it must now being replicated or deleted,
+      * so it must be in inflightReplication or in infligthDeletion.
+      * thus, if we can not find cid in both of them , this cid must
+      * not be in inflightMove.
+      */
+

Review comment:
       to be simple, I take a conservative strategy here : replication can always be executed, but the execution of deletion always depends on placement policy, i think this do no harm to data safety. the placement policy check is delayed to `handleOverReplicatedContainer`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r660249416



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -862,16 +1061,55 @@ private void handleOverReplicatedContainer(final ContainerInfo container,
           break;
         }
       }
+
+      eligibleReplicas.removeAll(unhealthyReplicas);
+      boolean isInMove = inflightMove.containsKey(id);
+      boolean isSourceDnInReplicaSet = false;
+      boolean isTargetDnInReplicaSet = false;
+
+      if (isInMove) {
+        Pair<DatanodeDetails, DatanodeDetails> movePair =
+            inflightMove.get(id);
+        final DatanodeDetails sourceDN = movePair.getKey();
+        isSourceDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails().equals(sourceDN));
+        isTargetDnInReplicaSet = eligibleReplicas.stream()
+            .anyMatch(r -> r.getDatanodeDetails()
+                .equals(movePair.getValue()));
+        int sourceDnPos = 0;
+        for (int i = 0; i < eligibleReplicas.size(); i++) {
+          if (eligibleReplicas.get(i).getDatanodeDetails()
+              .equals(sourceDN)) {
+            sourceDnPos = i;
+            break;
+          }
+        }
+        if (isTargetDnInReplicaSet) {
+          // if the container is in inflightMove and target datanode is
+          // included in the replicas, then swap the source datanode to
+          // first of the replica list if exists, so the source datanode
+          // will be first removed if possible.
+          eligibleReplicas.add(0, eligibleReplicas.remove(sourceDnPos));
+        } else {
+          // a container replica that being moved should not be removed.
+          // if the container is in inflightMove and target datanode is not
+          // included in the replicas, then swap the source datanode to the
+          // last of the replica list, so the source datanode will not
+          // be removed.
+          eligibleReplicas.add(eligibleReplicas.remove(sourceDnPos));

Review comment:
       > TargetDN would receive all the existing replicas as source to copy from
   
   in move option, we maintain src and target in inflightMove and track the replication from source to target. if we take other the existing replicas as the source, Although the final result seems same(sourceDN is not present and targetDN is present), it seems a little confused in Semantics. i think it will better off making sure the source not be deleted until the target is present.
   
   > I think that at end if sourceDN is not present and targetDN is present move is successful.
   
   yes , it make sense. if the target is present, and source disappear somehow, we can consider move is successful.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org