You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/03 09:45:32 UTC

[GitHub] [ozone] JacksonYao287 opened a new pull request #2488: make balancer HA aware

JacksonYao287 opened a new pull request #2488:
URL: https://github.com/apache/ozone/pull/2488


   ## What changes were proposed in this pull request?
   
   make balancer HA aware
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5253
   
   ## How was this patch tested?
   
   https://issues.apache.org/jira/browse/HDDS-5253
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2488: make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-891701574


   @lokeshj1703 @ChenSammi @siddhantsangwan PTAL! if the logics looks good , i will add more unit test in additional commit, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r689362791



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1712,8 +1745,244 @@ public ReplicationManagerMetrics getMetrics() {
     return inflightDeletion;
   }
 
-  public Map<ContainerID,
-      Pair<DatanodeDetails, DatanodeDetails>> getInflightMove() {
-    return inflightMove;
+  public Map<ContainerID, CompletableFuture<MoveResult>> getInflightMove() {
+    return inflightMoveFuture;
+  }
+
+  /**
+  * make move option HA aware.
+  */
+  public interface MoveScheduler {

Review comment:
       I was recommending this because the class ReplicationManager is too big right now. But I am fine either way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r684862232



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -610,50 +630,56 @@ private void updateMoveIfNeeded(final boolean isUnhealthy,
      */
 
     if (isSource && isInflightReplication) {
-      inflightMoveFuture.get(id).complete(
-          MoveResult.UNEXPECTED_REMOVE_SOURCE_AT_INFLIGHT_REPLICATION);
-      inflightMove.remove(id);
-      inflightMoveFuture.remove(id);
+      if (hasMoveFuture) {
+        inflightMoveFuture.get(id).complete(
+            MoveResult.UNEXPECTED_REMOVE_SOURCE_AT_INFLIGHT_REPLICATION);
+        inflightMoveFuture.remove(id);
+      }

Review comment:
       yea , seems good , i will do this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r686980369



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1712,8 +1745,244 @@ public ReplicationManagerMetrics getMetrics() {
     return inflightDeletion;
   }
 
-  public Map<ContainerID,
-      Pair<DatanodeDetails, DatanodeDetails>> getInflightMove() {
-    return inflightMove;
+  public Map<ContainerID, CompletableFuture<MoveResult>> getInflightMove() {
+    return inflightMoveFuture;
+  }
+
+  /**
+  * make move option HA aware.
+  */
+  public interface MoveScheduler {
+    /**
+     * completeMove a move action for a given container.
+     *
+     * @param contianerIDProto Container to which the move option is finished
+     */
+    @Replicate
+    void completeMove(HddsProtos.ContainerID contianerIDProto);
+
+    /**
+     * start a move action for a given container.
+     *
+     * @param contianerIDProto Container to move
+     * @param mp encapsulates the source and target datanode infos
+     */
+    @Replicate
+    void startMove(HddsProtos.ContainerID contianerIDProto,
+              HddsProtos.MoveDataNodePairProto mp) throws IOException;
+
+    /**
+     * get the MoveDataNodePair of the giver container.
+     *
+     * @param cid Container to move
+     * @return null if cid is not found in MoveScheduler,
+     *          or the corresponding MoveDataNodePair
+     */
+    MoveDataNodePair getMoveDataNodePair(ContainerID cid);
+
+    /**
+     * Reinitialize the MoveScheduler with DB if become leader.
+     */
+    void reinitialize(Table<ContainerID,
+        MoveDataNodePair> moveTable) throws IOException;
+
+    /**
+     * get all the inflight move info.
+     */
+    Map<ContainerID, MoveDataNodePair> getInflightMove();
+  }
+
+  /**
+   * @return the moveScheduler of RM
+   */
+  public MoveScheduler getMoveScheduler() {
+    return moveScheduler;
+  }
+
+  /**
+   * Ratis based MoveScheduler, db operations are stored in
+   * DBTransactionBuffer until a snapshot is taken.
+   */
+  public static final class MoveSchedulerImpl implements MoveScheduler {
+    private Table<ContainerID, MoveDataNodePair> moveTable;
+    private final DBTransactionBuffer transactionBuffer;
+    /**
+     * This is used for tracking container move commands
+     * which are not yet complete.
+     */
+    private final Map<ContainerID, MoveDataNodePair> inflightMove;
+
+    private MoveSchedulerImpl(Table<ContainerID, MoveDataNodePair> moveTable,
+                DBTransactionBuffer transactionBuffer) throws IOException {
+      this.moveTable = moveTable;
+      this.transactionBuffer = transactionBuffer;
+      this.inflightMove = new ConcurrentHashMap<>();
+      initialize();
+    }
+
+    @Override
+    public void completeMove(HddsProtos.ContainerID contianerIDProto) {

Review comment:
       >For all these apis which require protobuf, we can create another function which takes ContainerID as input and redirects to these apis?
   
   yea , exactly, i think so too! but , for now , lot of HA function, such as
   ```
     @Replicate
     void removeContainer(HddsProtos.ContainerID containerInfo)
         throws IOException;
   ```
   has the same problem, so maybe i can create a new jira to solve all these problems
   
   >Even for move api I think we can support source and target as input parameters.
   yea, i will a function to wrap current move
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-894972112


   @JacksonYao287 Can you also add a UT with HA to test the consistency of move?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-895837782


   @lokeshj1703 I have added UT and integration tests for move HA, PTAL!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-894973789


   > Can you also add a UT with HA to test the consistency of move?
   
   @lokeshj1703 , sure, thanks, if the logic looks good to you now,  i will add UT in a new commit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2488: HDDS-5253. Support container move HA

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-899431263


   thanks @lokeshj1703 for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2488: HDDS-5253. Support container move HA

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-899423528


   > We will also need to add timeout for move with the leader changes. After a leader change the move will be reset so I think we will need to handle the timeouts as well. It might require some thought. Please see if you would like to address it in this PR.
   
   thanks @lokeshj1703 for pointing out this. i think it is better off addressing it in a new jira, will do it later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-896512086


   CI failures seems not caused by this patch, i will merge mater branch again after they are fixed! #2420 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-896754323


   Also we will need to make sure that balancer is always restarted on leader with the new configuration. This would be a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 merged pull request #2488: HDDS-5253. Support container move HA

Posted by GitBox <gi...@apache.org>.
lokeshj1703 merged pull request #2488:
URL: https://github.com/apache/ozone/pull/2488


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-897429704


   >Also we will need to make sure that balancer is always restarted on leader with the new configuration. This would be a separate PR.
   
   sure , will do this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r686887254



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1712,8 +1745,244 @@ public ReplicationManagerMetrics getMetrics() {
     return inflightDeletion;
   }
 
-  public Map<ContainerID,
-      Pair<DatanodeDetails, DatanodeDetails>> getInflightMove() {
-    return inflightMove;
+  public Map<ContainerID, CompletableFuture<MoveResult>> getInflightMove() {
+    return inflightMoveFuture;
+  }
+
+  /**
+  * make move option HA aware.
+  */
+  public interface MoveScheduler {

Review comment:
       yea, sure, but considering they are small and simple, and only be used in replication manager (we also do not want it is used in other places), so i think it is better off keeping it inside replicationManager for now . refactoring is easy if needed in the future. what do you think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-894915140


   i have updated the PR according the comments, @lokeshj1703 @ChenSammi @siddhantsangwan  please take a look!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-899353464


   @JacksonYao287 We will also need to add timeout for move with the leader changes. After a leader change the move will be reset so I think we will need to handle the timeouts as well. It might require some thought. Please see if you would like to address it in this PR.
   Let's also rename jira and PR to sth like `Support move with HA`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on pull request #2488: HDDS-5253. Support container move HA

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#issuecomment-899430144


   @JacksonYao287 Thanks for the contribution! I have committed the PR to master branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r684383691



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -664,11 +690,10 @@ private void updateMoveIfNeeded(final boolean isUnhealthy,
    * add a move action for a given container.
    *
    * @param cid Container to move
-   * @param srcDn datanode to move from
-   * @param targetDn datanode to move to
+   * @param mp MoveDataNodePair which contains source and target datanodes
    */
   public CompletableFuture<MoveResult> move(ContainerID cid,

Review comment:
       We should also fail move if ReplicationManager is not running or is not a leader?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -610,50 +630,56 @@ private void updateMoveIfNeeded(final boolean isUnhealthy,
      */
 
     if (isSource && isInflightReplication) {
-      inflightMoveFuture.get(id).complete(
-          MoveResult.UNEXPECTED_REMOVE_SOURCE_AT_INFLIGHT_REPLICATION);
-      inflightMove.remove(id);
-      inflightMoveFuture.remove(id);
+      if (hasMoveFuture) {
+        inflightMoveFuture.get(id).complete(
+            MoveResult.UNEXPECTED_REMOVE_SOURCE_AT_INFLIGHT_REPLICATION);
+        inflightMoveFuture.remove(id);
+      }

Review comment:
       We can add a new private function for this logic since it is used in multiple places.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] lokeshj1703 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r686731062



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1712,8 +1745,244 @@ public ReplicationManagerMetrics getMetrics() {
     return inflightDeletion;
   }
 
-  public Map<ContainerID,
-      Pair<DatanodeDetails, DatanodeDetails>> getInflightMove() {
-    return inflightMove;
+  public Map<ContainerID, CompletableFuture<MoveResult>> getInflightMove() {
+    return inflightMoveFuture;
+  }
+
+  /**
+  * make move option HA aware.
+  */
+  public interface MoveScheduler {
+    /**
+     * completeMove a move action for a given container.
+     *
+     * @param contianerIDProto Container to which the move option is finished
+     */
+    @Replicate
+    void completeMove(HddsProtos.ContainerID contianerIDProto);
+
+    /**
+     * start a move action for a given container.
+     *
+     * @param contianerIDProto Container to move
+     * @param mp encapsulates the source and target datanode infos
+     */
+    @Replicate
+    void startMove(HddsProtos.ContainerID contianerIDProto,
+              HddsProtos.MoveDataNodePairProto mp) throws IOException;
+
+    /**
+     * get the MoveDataNodePair of the giver container.
+     *
+     * @param cid Container to move
+     * @return null if cid is not found in MoveScheduler,
+     *          or the corresponding MoveDataNodePair
+     */
+    MoveDataNodePair getMoveDataNodePair(ContainerID cid);
+
+    /**
+     * Reinitialize the MoveScheduler with DB if become leader.
+     */
+    void reinitialize(Table<ContainerID,
+        MoveDataNodePair> moveTable) throws IOException;
+
+    /**
+     * get all the inflight move info.
+     */
+    Map<ContainerID, MoveDataNodePair> getInflightMove();
+  }
+
+  /**
+   * @return the moveScheduler of RM
+   */
+  public MoveScheduler getMoveScheduler() {
+    return moveScheduler;
+  }
+
+  /**
+   * Ratis based MoveScheduler, db operations are stored in
+   * DBTransactionBuffer until a snapshot is taken.
+   */
+  public static final class MoveSchedulerImpl implements MoveScheduler {
+    private Table<ContainerID, MoveDataNodePair> moveTable;
+    private final DBTransactionBuffer transactionBuffer;
+    /**
+     * This is used for tracking container move commands
+     * which are not yet complete.
+     */
+    private final Map<ContainerID, MoveDataNodePair> inflightMove;
+
+    private MoveSchedulerImpl(Table<ContainerID, MoveDataNodePair> moveTable,
+                DBTransactionBuffer transactionBuffer) throws IOException {
+      this.moveTable = moveTable;
+      this.transactionBuffer = transactionBuffer;
+      this.inflightMove = new ConcurrentHashMap<>();
+      initialize();
+    }
+
+    @Override
+    public void completeMove(HddsProtos.ContainerID contianerIDProto) {

Review comment:
       For all these apis which require protobuf, we can create another function which takes ContainerID as input and redirects to these apis?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1712,8 +1745,244 @@ public ReplicationManagerMetrics getMetrics() {
     return inflightDeletion;
   }
 
-  public Map<ContainerID,
-      Pair<DatanodeDetails, DatanodeDetails>> getInflightMove() {
-    return inflightMove;
+  public Map<ContainerID, CompletableFuture<MoveResult>> getInflightMove() {
+    return inflightMoveFuture;
+  }
+
+  /**
+  * make move option HA aware.
+  */
+  public interface MoveScheduler {

Review comment:
       We can probably move MoveScheduler and Impl to separate classes?

##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1712,8 +1745,244 @@ public ReplicationManagerMetrics getMetrics() {
     return inflightDeletion;
   }
 
-  public Map<ContainerID,
-      Pair<DatanodeDetails, DatanodeDetails>> getInflightMove() {
-    return inflightMove;
+  public Map<ContainerID, CompletableFuture<MoveResult>> getInflightMove() {
+    return inflightMoveFuture;
+  }
+
+  /**
+  * make move option HA aware.
+  */
+  public interface MoveScheduler {
+    /**
+     * completeMove a move action for a given container.
+     *
+     * @param contianerIDProto Container to which the move option is finished
+     */
+    @Replicate
+    void completeMove(HddsProtos.ContainerID contianerIDProto);
+
+    /**
+     * start a move action for a given container.
+     *
+     * @param contianerIDProto Container to move
+     * @param mp encapsulates the source and target datanode infos
+     */
+    @Replicate
+    void startMove(HddsProtos.ContainerID contianerIDProto,
+              HddsProtos.MoveDataNodePairProto mp) throws IOException;
+
+    /**
+     * get the MoveDataNodePair of the giver container.
+     *
+     * @param cid Container to move
+     * @return null if cid is not found in MoveScheduler,
+     *          or the corresponding MoveDataNodePair
+     */
+    MoveDataNodePair getMoveDataNodePair(ContainerID cid);
+
+    /**
+     * Reinitialize the MoveScheduler with DB if become leader.
+     */
+    void reinitialize(Table<ContainerID,
+        MoveDataNodePair> moveTable) throws IOException;
+
+    /**
+     * get all the inflight move info.
+     */
+    Map<ContainerID, MoveDataNodePair> getInflightMove();
+  }
+
+  /**
+   * @return the moveScheduler of RM
+   */
+  public MoveScheduler getMoveScheduler() {
+    return moveScheduler;
+  }
+
+  /**
+   * Ratis based MoveScheduler, db operations are stored in
+   * DBTransactionBuffer until a snapshot is taken.
+   */
+  public static final class MoveSchedulerImpl implements MoveScheduler {
+    private Table<ContainerID, MoveDataNodePair> moveTable;
+    private final DBTransactionBuffer transactionBuffer;
+    /**
+     * This is used for tracking container move commands
+     * which are not yet complete.
+     */
+    private final Map<ContainerID, MoveDataNodePair> inflightMove;
+
+    private MoveSchedulerImpl(Table<ContainerID, MoveDataNodePair> moveTable,
+                DBTransactionBuffer transactionBuffer) throws IOException {
+      this.moveTable = moveTable;
+      this.transactionBuffer = transactionBuffer;
+      this.inflightMove = new ConcurrentHashMap<>();
+      initialize();
+    }
+
+    @Override
+    public void completeMove(HddsProtos.ContainerID contianerIDProto) {

Review comment:
       Even for move api I think we can support source and target as input parameters.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r686980369



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1712,8 +1745,244 @@ public ReplicationManagerMetrics getMetrics() {
     return inflightDeletion;
   }
 
-  public Map<ContainerID,
-      Pair<DatanodeDetails, DatanodeDetails>> getInflightMove() {
-    return inflightMove;
+  public Map<ContainerID, CompletableFuture<MoveResult>> getInflightMove() {
+    return inflightMoveFuture;
+  }
+
+  /**
+  * make move option HA aware.
+  */
+  public interface MoveScheduler {
+    /**
+     * completeMove a move action for a given container.
+     *
+     * @param contianerIDProto Container to which the move option is finished
+     */
+    @Replicate
+    void completeMove(HddsProtos.ContainerID contianerIDProto);
+
+    /**
+     * start a move action for a given container.
+     *
+     * @param contianerIDProto Container to move
+     * @param mp encapsulates the source and target datanode infos
+     */
+    @Replicate
+    void startMove(HddsProtos.ContainerID contianerIDProto,
+              HddsProtos.MoveDataNodePairProto mp) throws IOException;
+
+    /**
+     * get the MoveDataNodePair of the giver container.
+     *
+     * @param cid Container to move
+     * @return null if cid is not found in MoveScheduler,
+     *          or the corresponding MoveDataNodePair
+     */
+    MoveDataNodePair getMoveDataNodePair(ContainerID cid);
+
+    /**
+     * Reinitialize the MoveScheduler with DB if become leader.
+     */
+    void reinitialize(Table<ContainerID,
+        MoveDataNodePair> moveTable) throws IOException;
+
+    /**
+     * get all the inflight move info.
+     */
+    Map<ContainerID, MoveDataNodePair> getInflightMove();
+  }
+
+  /**
+   * @return the moveScheduler of RM
+   */
+  public MoveScheduler getMoveScheduler() {
+    return moveScheduler;
+  }
+
+  /**
+   * Ratis based MoveScheduler, db operations are stored in
+   * DBTransactionBuffer until a snapshot is taken.
+   */
+  public static final class MoveSchedulerImpl implements MoveScheduler {
+    private Table<ContainerID, MoveDataNodePair> moveTable;
+    private final DBTransactionBuffer transactionBuffer;
+    /**
+     * This is used for tracking container move commands
+     * which are not yet complete.
+     */
+    private final Map<ContainerID, MoveDataNodePair> inflightMove;
+
+    private MoveSchedulerImpl(Table<ContainerID, MoveDataNodePair> moveTable,
+                DBTransactionBuffer transactionBuffer) throws IOException {
+      this.moveTable = moveTable;
+      this.transactionBuffer = transactionBuffer;
+      this.inflightMove = new ConcurrentHashMap<>();
+      initialize();
+    }
+
+    @Override
+    public void completeMove(HddsProtos.ContainerID contianerIDProto) {

Review comment:
       >For all these apis which require protobuf, we can create another function which takes ContainerID as input and redirects to these apis?
   
   yea , exactly, i think so too! but , for now , lot of HA function, such as
   ```
     @Replicate
     void removeContainer(HddsProtos.ContainerID containerInfo)
         throws IOException;
   ```
   has the same problem, so maybe i can create a new jira to solve all these problems
   
   >Even for move api I think we can support source and target as input parameters.
   
   yea, i will a function to wrap current move
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r686980369



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1712,8 +1745,244 @@ public ReplicationManagerMetrics getMetrics() {
     return inflightDeletion;
   }
 
-  public Map<ContainerID,
-      Pair<DatanodeDetails, DatanodeDetails>> getInflightMove() {
-    return inflightMove;
+  public Map<ContainerID, CompletableFuture<MoveResult>> getInflightMove() {
+    return inflightMoveFuture;
+  }
+
+  /**
+  * make move option HA aware.
+  */
+  public interface MoveScheduler {
+    /**
+     * completeMove a move action for a given container.
+     *
+     * @param contianerIDProto Container to which the move option is finished
+     */
+    @Replicate
+    void completeMove(HddsProtos.ContainerID contianerIDProto);
+
+    /**
+     * start a move action for a given container.
+     *
+     * @param contianerIDProto Container to move
+     * @param mp encapsulates the source and target datanode infos
+     */
+    @Replicate
+    void startMove(HddsProtos.ContainerID contianerIDProto,
+              HddsProtos.MoveDataNodePairProto mp) throws IOException;
+
+    /**
+     * get the MoveDataNodePair of the giver container.
+     *
+     * @param cid Container to move
+     * @return null if cid is not found in MoveScheduler,
+     *          or the corresponding MoveDataNodePair
+     */
+    MoveDataNodePair getMoveDataNodePair(ContainerID cid);
+
+    /**
+     * Reinitialize the MoveScheduler with DB if become leader.
+     */
+    void reinitialize(Table<ContainerID,
+        MoveDataNodePair> moveTable) throws IOException;
+
+    /**
+     * get all the inflight move info.
+     */
+    Map<ContainerID, MoveDataNodePair> getInflightMove();
+  }
+
+  /**
+   * @return the moveScheduler of RM
+   */
+  public MoveScheduler getMoveScheduler() {
+    return moveScheduler;
+  }
+
+  /**
+   * Ratis based MoveScheduler, db operations are stored in
+   * DBTransactionBuffer until a snapshot is taken.
+   */
+  public static final class MoveSchedulerImpl implements MoveScheduler {
+    private Table<ContainerID, MoveDataNodePair> moveTable;
+    private final DBTransactionBuffer transactionBuffer;
+    /**
+     * This is used for tracking container move commands
+     * which are not yet complete.
+     */
+    private final Map<ContainerID, MoveDataNodePair> inflightMove;
+
+    private MoveSchedulerImpl(Table<ContainerID, MoveDataNodePair> moveTable,
+                DBTransactionBuffer transactionBuffer) throws IOException {
+      this.moveTable = moveTable;
+      this.transactionBuffer = transactionBuffer;
+      this.inflightMove = new ConcurrentHashMap<>();
+      initialize();
+    }
+
+    @Override
+    public void completeMove(HddsProtos.ContainerID contianerIDProto) {

Review comment:
       >For all these apis which require protobuf, we can create another function which takes ContainerID as input and redirects to these apis?
   yea , exactly, i think so too! but , for now , lot of HA function, such as
   ```
     @Replicate
     void removeContainer(HddsProtos.ContainerID containerInfo)
         throws IOException;
   ```
   has the same problem, so maybe i can create a new jira to solve all these problems
   
   >Even for move api I think we can support source and target as input parameters.
   yea, i will a function to wrap current move
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2488: HDDS-5253. make balancer HA aware

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2488:
URL: https://github.com/apache/ozone/pull/2488#discussion_r684862330



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -664,11 +690,10 @@ private void updateMoveIfNeeded(final boolean isUnhealthy,
    * add a move action for a given container.
    *
    * @param cid Container to move
-   * @param srcDn datanode to move from
-   * @param targetDn datanode to move to
+   * @param mp MoveDataNodePair which contains source and target datanodes
    */
   public CompletableFuture<MoveResult> move(ContainerID cid,

Review comment:
       thanks  for pointing out this, i will add this check




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org