You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/07/05 13:01:30 UTC

[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2349: HDDS-4928. Support container move in Replication Manager

JacksonYao287 commented on a change in pull request #2349:
URL: https://github.com/apache/ozone/pull/2349#discussion_r663914240



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -471,6 +549,127 @@ private void updateInflightAction(final ContainerInfo container,
     }
   }
 
+  /**
+   * add a move action for a given container.
+   *
+   * @param cid Container to move
+   * @param srcDn datanode to move from
+   * @param targetDn datanode to move to
+   */
+  public Optional<CompletableFuture<MoveResult>> move(ContainerID cid,
+      DatanodeDetails srcDn, DatanodeDetails targetDn)
+      throws ContainerNotFoundException, NodeNotFoundException {
+    LOG.info("receive a move requset about container {} , from {} to {}",
+        cid, srcDn.getUuid(), targetDn.getUuid());
+    Optional<CompletableFuture<MoveResult>> ret = Optional.empty();
+    if (!isRunning()) {
+      LOG.info("Replication Manager in not running. please start it first");
+      return ret;
+    }
+
+    /*
+     * make sure the flowing conditions are met:
+     *  1 the given two datanodes are in healthy state
+     *  2 the given container exists on the given source datanode
+     *  3 the given container does not exist on the given target datanode
+     *  4 the given container is in closed state
+     *  5 the giver container is not taking any inflight action
+     *  6 the given two datanodes are in IN_SERVICE state
+     *
+     * move is a combination of two steps : replication and deletion.
+     * if the conditions above are all met, then we take a conservative
+     * strategy here : replication can always be executed, but the execution
+     * of deletion always depends on placement policy
+     */
+
+    NodeStatus currentNodeStat = nodeManager.getNodeStatus(srcDn);
+    NodeState healthStat = currentNodeStat.getHealth();
+    NodeOperationalState operationalState =
+        currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given source datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    currentNodeStat = nodeManager.getNodeStatus(targetDn);
+    healthStat = currentNodeStat.getHealth();
+    operationalState = currentNodeStat.getOperationalState();
+    if (healthStat != NodeState.HEALTHY) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in HEALTHY state", healthStat);
+      return ret;
+    }
+    if (operationalState != NodeOperationalState.IN_SERVICE) {
+      LOG.info("given target datanode is in {} state, " +
+          "not in IN_SERVICE state", operationalState);
+      return ret;
+    }
+
+    // we need to synchronize on ContainerInfo, since it is
+    // shared by ICR/FCR handler and this.processContainer
+    // TODO: use a Read lock after introducing a RW lock into ContainerInfo
+    ContainerInfo cif = containerManager.getContainer(cid);
+    synchronized (cif) {
+      final Set<DatanodeDetails> replicas = containerManager
+            .getContainerReplicas(cid).stream()
+            .map(ContainerReplica::getDatanodeDetails)
+            .collect(Collectors.toSet());
+      if (replicas.contains(targetDn)) {
+        LOG.info("given container exists in the target Datanode");
+        return ret;
+      }
+      if (!replicas.contains(srcDn)) {
+        LOG.info("given container does not exist in the source Datanode");
+        return ret;
+      }
+
+      /*
+      * the reason why the given container should not be taking any inflight
+      * action is that: if the given container is being replicated or deleted,
+      * the num of its replica is not deterministic, so move operation issued
+      * by balancer may cause a nondeterministic result, so we should drop
+      * this option for this time.
+      * */
+
+      if (inflightReplication.containsKey(cid)) {
+        LOG.info("given container is in inflight replication");
+        return ret;
+      }
+      if (inflightDeletion.containsKey(cid)) {
+        LOG.info("given container is in inflight deletion");
+        return ret;
+      }
+
+      /*
+      * here, no need to see whether cid is in inflightMove, because
+      * these three map are all synchronized on ContainerInfo, if cid
+      * is in infligtMove , it must now being replicated or deleted,
+      * so it must be in inflightReplication or in infligthDeletion.
+      * thus, if we can not find cid in both of them , this cid must
+      * not be in inflightMove.
+      */
+

Review comment:
       yea, it makes sense, i will change this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org