You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2022/10/27 09:00:53 UTC

[GitHub] [iotdb] xingtanzjr opened a new pull request, #7759: Add wait logic to ensure no data lost when remove a Peer from MultiLeader consensus group

xingtanzjr opened a new pull request, #7759:
URL: https://github.com/apache/iotdb/pull/7759

   ## Description
   Before this change, if the removed peer is Leader in the consensus group, the data may lost if the peer's log sync is not completed. 
   In this PR, we add the logic to wait the SyncLog to be completed before removing it.
   
   ![Uploading image.png…]()
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [iotdb] SpriCoder commented on a diff in pull request #7759: Add wait logic to ensure no data lost when remove a Peer from MultiLeader consensus group

Posted by GitBox <gi...@apache.org>.
SpriCoder commented on code in PR #7759:
URL: https://github.com/apache/iotdb/pull/7759#discussion_r1007741509


##########
consensus/src/main/java/org/apache/iotdb/consensus/multileader/MultiLeaderServerImpl.java:
##########
@@ -479,6 +481,45 @@ public void notifyPeersToRemoveSyncLogChannel(Peer targetPeer)
     }
   }
 
+  public void waitTargetPeerUntilSyncLogCompleted(Peer targetPeer)
+      throws ConsensusGroupAddPeerException {
+    long checkIntervalInMs = 10_000L;

Review Comment:
   Maybe this value can be a property of MultiLeaderConfig



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [iotdb] SpriCoder commented on a diff in pull request #7759: Add wait logic to ensure no data lost when remove a Peer from MultiLeader consensus group

Posted by GitBox <gi...@apache.org>.
SpriCoder commented on code in PR #7759:
URL: https://github.com/apache/iotdb/pull/7759#discussion_r1007743803


##########
consensus/src/main/java/org/apache/iotdb/consensus/multileader/MultiLeaderServerImpl.java:
##########
@@ -479,6 +481,45 @@ public void notifyPeersToRemoveSyncLogChannel(Peer targetPeer)
     }
   }
 
+  public void waitTargetPeerUntilSyncLogCompleted(Peer targetPeer)
+      throws ConsensusGroupAddPeerException {
+    long checkIntervalInMs = 10_000L;
+    try (SyncMultiLeaderServiceClient client =
+        syncClientManager.borrowClient(targetPeer.getEndpoint())) {
+      while (true) {
+        TWaitSyncLogCompleteRes res =
+            client.waitSyncLogComplete(
+                new TWaitSyncLogCompleteReq(targetPeer.getGroupId().convertToTConsensusGroupId()));
+        if (res.complete) {
+          logger.info(
+              "{} SyncLog is completed. TargetIndex: {}, CurrentSyncIndex: {}",
+              targetPeer,
+              res.searchIndex,
+              res.safeIndex);
+          return;
+        }
+        logger.info(
+            "{} SyncLog is still in progress. TargetIndex: {}, CurrentSyncIndex: {}",
+            targetPeer,
+            res.searchIndex,
+            res.safeIndex);
+        Thread.sleep(checkIntervalInMs);
+      }
+    } catch (IOException | TException e) {
+      throw new ConsensusGroupAddPeerException(
+          String.format(
+              "error when waiting %s to complete SyncLog. %s", targetPeer, e.getMessage()),
+          e);
+    } catch (InterruptedException e) {
+      Thread.currentThread().interrupt();
+      throw new ConsensusGroupAddPeerException(
+          String.format(
+              "thread interrupted when waiting %s to complete SyncLog. %s",
+              targetPeer, e.getMessage()),
+          e);

Review Comment:
   I think we shouldn't throw `ConsensusGroupAddPeerException` here which may cause confuse.



##########
consensus/src/main/java/org/apache/iotdb/consensus/multileader/MultiLeaderServerImpl.java:
##########
@@ -479,6 +481,45 @@ public void notifyPeersToRemoveSyncLogChannel(Peer targetPeer)
     }
   }
 
+  public void waitTargetPeerUntilSyncLogCompleted(Peer targetPeer)
+      throws ConsensusGroupAddPeerException {
+    long checkIntervalInMs = 10_000L;
+    try (SyncMultiLeaderServiceClient client =
+        syncClientManager.borrowClient(targetPeer.getEndpoint())) {
+      while (true) {
+        TWaitSyncLogCompleteRes res =
+            client.waitSyncLogComplete(
+                new TWaitSyncLogCompleteReq(targetPeer.getGroupId().convertToTConsensusGroupId()));
+        if (res.complete) {
+          logger.info(
+              "{} SyncLog is completed. TargetIndex: {}, CurrentSyncIndex: {}",
+              targetPeer,
+              res.searchIndex,
+              res.safeIndex);
+          return;
+        }
+        logger.info(
+            "{} SyncLog is still in progress. TargetIndex: {}, CurrentSyncIndex: {}",
+            targetPeer,
+            res.searchIndex,
+            res.safeIndex);
+        Thread.sleep(checkIntervalInMs);
+      }
+    } catch (IOException | TException e) {
+      throw new ConsensusGroupAddPeerException(
+          String.format(
+              "error when waiting %s to complete SyncLog. %s", targetPeer, e.getMessage()),
+          e);

Review Comment:
   I think we shouldn't throw `ConsensusGroupAddPeerException` here which may cause confuse.



##########
consensus/src/main/java/org/apache/iotdb/consensus/multileader/MultiLeaderConsensus.java:
##########
@@ -310,13 +310,25 @@ public ConsensusGenericResponse removePeer(ConsensusGroupId groupId, Peer peer)
           .build();
     }
     try {
+      // let other peers remove the sync channel with target peer
       impl.notifyPeersToRemoveSyncLogChannel(peer);
     } catch (ConsensusGroupAddPeerException e) {
       return ConsensusGenericResponse.newBuilder()
           .setSuccess(false)
           .setException(new ConsensusException(e.getMessage()))
           .build();
     }
+
+    try {
+      // let target peer reject new write
+      impl.inactivePeer(peer);
+      // wait its SyncLog to complete
+      impl.waitTargetPeerUntilSyncLogCompleted(peer);
+    } catch (ConsensusGroupAddPeerException e) {
+      // we only log warning here because sometimes the target peer may already be down
+      logger.warn("cannot wait {} to complete SyncLog. error message: {}", peer, e.getMessage());
+    }

Review Comment:
   Why we catch `ConsensusGroupAddPeerException` in removePeer method? I think we need to specific the exception



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [iotdb] xingtanzjr commented on a diff in pull request #7759: Add wait logic to ensure no data lost when remove a Peer from MultiLeader consensus group

Posted by GitBox <gi...@apache.org>.
xingtanzjr commented on code in PR #7759:
URL: https://github.com/apache/iotdb/pull/7759#discussion_r1007776161


##########
consensus/src/main/java/org/apache/iotdb/consensus/multileader/MultiLeaderServerImpl.java:
##########
@@ -479,6 +481,45 @@ public void notifyPeersToRemoveSyncLogChannel(Peer targetPeer)
     }
   }
 
+  public void waitTargetPeerUntilSyncLogCompleted(Peer targetPeer)
+      throws ConsensusGroupAddPeerException {
+    long checkIntervalInMs = 10_000L;
+    try (SyncMultiLeaderServiceClient client =
+        syncClientManager.borrowClient(targetPeer.getEndpoint())) {
+      while (true) {
+        TWaitSyncLogCompleteRes res =
+            client.waitSyncLogComplete(
+                new TWaitSyncLogCompleteReq(targetPeer.getGroupId().convertToTConsensusGroupId()));
+        if (res.complete) {
+          logger.info(
+              "{} SyncLog is completed. TargetIndex: {}, CurrentSyncIndex: {}",
+              targetPeer,
+              res.searchIndex,
+              res.safeIndex);
+          return;
+        }
+        logger.info(
+            "{} SyncLog is still in progress. TargetIndex: {}, CurrentSyncIndex: {}",
+            targetPeer,
+            res.searchIndex,
+            res.safeIndex);
+        Thread.sleep(checkIntervalInMs);
+      }
+    } catch (IOException | TException e) {
+      throw new ConsensusGroupAddPeerException(
+          String.format(
+              "error when waiting %s to complete SyncLog. %s", targetPeer, e.getMessage()),
+          e);

Review Comment:
   Sure, will define and use another exception



##########
consensus/src/main/java/org/apache/iotdb/consensus/multileader/MultiLeaderServerImpl.java:
##########
@@ -479,6 +481,45 @@ public void notifyPeersToRemoveSyncLogChannel(Peer targetPeer)
     }
   }
 
+  public void waitTargetPeerUntilSyncLogCompleted(Peer targetPeer)
+      throws ConsensusGroupAddPeerException {
+    long checkIntervalInMs = 10_000L;
+    try (SyncMultiLeaderServiceClient client =
+        syncClientManager.borrowClient(targetPeer.getEndpoint())) {
+      while (true) {
+        TWaitSyncLogCompleteRes res =
+            client.waitSyncLogComplete(
+                new TWaitSyncLogCompleteReq(targetPeer.getGroupId().convertToTConsensusGroupId()));
+        if (res.complete) {
+          logger.info(
+              "{} SyncLog is completed. TargetIndex: {}, CurrentSyncIndex: {}",
+              targetPeer,
+              res.searchIndex,
+              res.safeIndex);
+          return;
+        }
+        logger.info(
+            "{} SyncLog is still in progress. TargetIndex: {}, CurrentSyncIndex: {}",
+            targetPeer,
+            res.searchIndex,
+            res.safeIndex);
+        Thread.sleep(checkIntervalInMs);
+      }
+    } catch (IOException | TException e) {
+      throw new ConsensusGroupAddPeerException(
+          String.format(
+              "error when waiting %s to complete SyncLog. %s", targetPeer, e.getMessage()),
+          e);
+    } catch (InterruptedException e) {
+      Thread.currentThread().interrupt();
+      throw new ConsensusGroupAddPeerException(
+          String.format(
+              "thread interrupted when waiting %s to complete SyncLog. %s",
+              targetPeer, e.getMessage()),
+          e);

Review Comment:
   Sure, will define and use another exception



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [iotdb] xingtanzjr commented on a diff in pull request #7759: Add wait logic to ensure no data lost when remove a Peer from MultiLeader consensus group

Posted by GitBox <gi...@apache.org>.
xingtanzjr commented on code in PR #7759:
URL: https://github.com/apache/iotdb/pull/7759#discussion_r1007775868


##########
consensus/src/main/java/org/apache/iotdb/consensus/multileader/MultiLeaderServerImpl.java:
##########
@@ -479,6 +481,45 @@ public void notifyPeersToRemoveSyncLogChannel(Peer targetPeer)
     }
   }
 
+  public void waitTargetPeerUntilSyncLogCompleted(Peer targetPeer)
+      throws ConsensusGroupAddPeerException {
+    long checkIntervalInMs = 10_000L;

Review Comment:
   Both is fine. This parameter is used to avoid frequent useless queries and a fixed suitable interval is enough. For example, `10s` and `20s` are both ok and it is unnecessary to change it. So I think we can use a fixed value here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [iotdb] qiaojialin merged pull request #7759: Add wait logic to ensure no data lost when remove a Peer from MultiLeader consensus group

Posted by GitBox <gi...@apache.org>.
qiaojialin merged PR #7759:
URL: https://github.com/apache/iotdb/pull/7759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [iotdb] xingtanzjr commented on a diff in pull request #7759: Add wait logic to ensure no data lost when remove a Peer from MultiLeader consensus group

Posted by GitBox <gi...@apache.org>.
xingtanzjr commented on code in PR #7759:
URL: https://github.com/apache/iotdb/pull/7759#discussion_r1007773692


##########
consensus/src/main/java/org/apache/iotdb/consensus/multileader/MultiLeaderConsensus.java:
##########
@@ -310,13 +310,25 @@ public ConsensusGenericResponse removePeer(ConsensusGroupId groupId, Peer peer)
           .build();
     }
     try {
+      // let other peers remove the sync channel with target peer
       impl.notifyPeersToRemoveSyncLogChannel(peer);
     } catch (ConsensusGroupAddPeerException e) {
       return ConsensusGenericResponse.newBuilder()
           .setSuccess(false)
           .setException(new ConsensusException(e.getMessage()))
           .build();
     }
+
+    try {
+      // let target peer reject new write
+      impl.inactivePeer(peer);
+      // wait its SyncLog to complete
+      impl.waitTargetPeerUntilSyncLogCompleted(peer);
+    } catch (ConsensusGroupAddPeerException e) {
+      // we only log warning here because sometimes the target peer may already be down
+      logger.warn("cannot wait {} to complete SyncLog. error message: {}", peer, e.getMessage());
+    }

Review Comment:
   Sure, I will define another similar exception



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org