You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by GitBox <gi...@apache.org> on 2023/01/14 17:19:06 UTC

[GitHub] [ratis] kaijchen opened a new pull request, #807: RATIS-1762. Support transfer leadership between nodes with same priority

kaijchen opened a new pull request, #807:
URL: https://github.com/apache/ratis/pull/807

   ## What changes were proposed in this pull request?
   
   See Jira for details.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/RATIS-1762
   
   ## How was this patch tested?
   
   LeaderElectionTests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo merged pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo merged PR #807:
URL: https://github.com/apache/ratis/pull/807


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on a diff in pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on code in PR #807:
URL: https://github.com/apache/ratis/pull/807#discussion_r1093201401


##########
ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java:
##########
@@ -778,6 +778,7 @@ public void onFollowerSuccessAppendEntries(FollowerInfo follower) {
     } else {
       eventQueue.submit(checkStagingEvent);
     }
+    server.getTransferLeadership().onFollowerAppendEntriesReply(follower);

Review Comment:
   We should use `follower.getId()` instead of `follower.getPeer().getId()` as described in the javadoc.
   ```java
   //FollowerInfo
     /**
      * Return this follower's {@link RaftPeer}.
      * To obtain the {@link RaftPeerId}, use {@link #getId()} which is more efficient than this method.
      *
      * @return this follower's peer info.
      */
     RaftPeer getPeer();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1386578914

   I have created followup tasks under [RATIS-1247](https://issues.apache.org/jira/browse/RATIS-1247).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1382866426

   Hi @szetszwo, this is a draft implementation of the basic TransferLeadership operation.
   I'm wondering how to deal with the client specified timeout in TransferLeadershipRequest. Because a single operation will fail once an electionTimeout is elapsed. Looks like we should retry multiple times in the given timeout? What if the some other node won the election during this timeout?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1385430203

   > A small question. Current transferLeadership implementation will abort if it doesn't succeed for a random election timeout. Can we add a param to let the caller decide the excepted timeout for transferLeadership? We can control the stop-write duration through this param, whether to wait definitely or indefinitely, etc.
   
   Thanks for your suggestion. The old priority based TransferLeadership allows a user specified timeout.
   I will preserve this behavior. If the timeout is not set, it will default to a random election timeout. Otherwise, the user specified timeout will be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by GitBox <gi...@apache.org>.
szetszwo commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1385128458

   > ...  a single operation will fail once an electionTimeout is elapsed. Looks like we should retry multiple times in the given timeout? What if the some other node won the election during this timeout?
   
   When timeout, the server sends the exception back to the client.  Then, the client could retry (or abort) and wait again.  This is the current logic to handle timeout.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] SzyWilliam commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by GitBox <gi...@apache.org>.
SzyWilliam commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1385418816

   @kaijchen Thanks a lot for working on this! It's also a feature very wanted by Apache IoTDB. 
   A small question. Current transferLeadership implementation will abort if it doesn't succeed for a random election timeout. Can we add a param to let the caller decide the excepted timeout for transferLeadership? We can control the stop-write duration through this param, whether to wait definitely or indefinitely, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on a diff in pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on code in PR #807:
URL: https://github.com/apache/ratis/pull/807#discussion_r1092786533


##########
ratis-server/src/main/java/org/apache/ratis/server/impl/TransferLeadership.java:
##########
@@ -81,10 +83,50 @@ public String toString() {
     this.server = server;
   }
 
+  private RaftPeerId getTransferee() {
+    return Optional.ofNullable(pending.get())
+        .map(r -> r.getRequest().getNewLeader()).orElse(null);
+  }
+
   boolean isSteppingDown() {
     return pending.get() != null;
   }
 
+  void onFollowerAppendEntriesReply(FollowerInfo follower) {
+    final RaftPeerId transferee = getTransferee();
+    // If TransferLeadership is in progress, and the transferee has just append some entries
+    if (follower.getPeer().getId().equals(transferee)) {
+      // If the transferee is up-to-date, send StartLeaderElection to it
+      server.getRole().getLeaderState().ifPresent(leaderState -> {

Review Comment:
   Pass `LeaderStateImple` as a parameter.  Then, we don't have to get it here.



##########
ratis-server/src/main/java/org/apache/ratis/server/impl/TransferLeadership.java:
##########
@@ -81,10 +83,50 @@ public String toString() {
     this.server = server;
   }
 
+  private RaftPeerId getTransferee() {
+    return Optional.ofNullable(pending.get())
+        .map(r -> r.getRequest().getNewLeader()).orElse(null);
+  }
+
   boolean isSteppingDown() {
     return pending.get() != null;
   }
 
+  void onFollowerAppendEntriesReply(FollowerInfo follower) {
+    final RaftPeerId transferee = getTransferee();
+    // If TransferLeadership is in progress, and the transferee has just append some entries
+    if (follower.getPeer().getId().equals(transferee)) {
+      // If the transferee is up-to-date, send StartLeaderElection to it
+      server.getRole().getLeaderState().ifPresent(leaderState -> {
+        if (leaderState.sendStartLeaderElection(follower)) {
+          LOG.info("{}: sent StartLeaderElection to transferee {} after received AppendEntriesResponse",
+              server.getMemberId(), transferee);
+        }
+      });
+    }
+  }
+
+  private void tryTransferLeadership(RaftPeerId transferee) {
+    server.getRole().getLeaderState().ifPresent(leaderState -> {

Review Comment:
   We should also pass `leaderState` as a parameter here.



##########
ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java:
##########
@@ -778,6 +778,7 @@ public void onFollowerSuccessAppendEntries(FollowerInfo follower) {
     } else {
       eventQueue.submit(checkStagingEvent);
     }
+    server.getTransferLeadership().onFollowerAppendEntriesReply(follower);

Review Comment:
   I wonder if we can "remember" the transferee follower so that we don't have to call it for all the other followers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on a diff in pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #807:
URL: https://github.com/apache/ratis/pull/807#discussion_r1092796316


##########
ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java:
##########
@@ -778,6 +778,7 @@ public void onFollowerSuccessAppendEntries(FollowerInfo follower) {
     } else {
       eventQueue.submit(checkStagingEvent);
     }
+    server.getTransferLeadership().onFollowerAppendEntriesReply(follower);

Review Comment:
   We could remember transferee, but then we should notify leaderStateImpl when transferee is changed.
   
   Currently we are skipping other followers in `onFollowerAppendEntriesReply()` by this if statement:
   
   ```java
       if (follower.getPeer().getId().equals(transferee)) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on a diff in pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on code in PR #807:
URL: https://github.com/apache/ratis/pull/807#discussion_r1093204670


##########
ratis-server/src/main/java/org/apache/ratis/server/impl/TransferLeadership.java:
##########
@@ -81,11 +83,47 @@ public String toString() {
     this.server = server;
   }
 
+  private RaftPeerId getTransferee() {
+    return Optional.ofNullable(pending.get())
+        .map(r -> r.getRequest().getNewLeader()).orElse(null);
+  }
+
   boolean isSteppingDown() {
     return pending.get() != null;
   }
 
-  CompletableFuture<RaftClientReply> start(TransferLeadershipRequest request) {
+  void onFollowerAppendEntriesReply(LeaderStateImpl leaderState, FollowerInfo follower) {
+    final RaftPeerId transferee = getTransferee();
+    // If TransferLeadership is in progress, and the transferee has just append some entries
+    if (follower.getPeer().getId().equals(transferee)) {

Review Comment:
   It is more efficient to check null first.  We may also move the `Optional` inside:
   ```java
     void onFollowerAppendEntriesReply(LeaderStateImpl leaderState, FollowerInfo follower) {
       final Optional<RaftPeerId> transferee = Optional.ofNullable(pending.get()).map(r -> r.getRequest().getNewLeader());
   
       // If TransferLeadership is in progress, and the transferee has just append some entries
       if (transferee.filter(t -> t.equals(follower.getId())).isPresent()) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on a diff in pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by GitBox <gi...@apache.org>.
szetszwo commented on code in PR #807:
URL: https://github.com/apache/ratis/pull/807#discussion_r1071986684


##########
ratis-server-api/src/main/java/org/apache/ratis/server/leader/LeaderState.java:
##########
@@ -66,4 +67,13 @@ AppendEntriesRequestProto newAppendEntriesRequestProto(FollowerInfo follower,
   /** Received an {@link AppendEntriesReplyProto} */
   void onAppendEntriesReply(LogAppender appender, AppendEntriesReplyProto reply);
 
+  /** Received a TransferLeadershipRequest */
+  void onTransferLeadership(RaftPeerId transferee);
+
+  /** Abort the last TransferLeadership */
+  void abortTransferLeadership();
+
+  /** Check if TransferLeadership is in progress */
+  boolean isTransferLeadershipInProgress();
+

Review Comment:
   This are internal APIs -- all methods are used only inside the `org.apache.ratis.server.impl` package.  So, they should not be added here.



##########
ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java:
##########
@@ -1167,6 +1180,75 @@ public void onAppendEntriesReply(LogAppender appender, RaftProtos.AppendEntriesR
     readIndexHeartbeats.onAppendEntriesReply(appender, reply, this::hasMajority);
   }
 
+  @Override
+  public void onTransferLeadership(RaftPeerId transferee) {

Review Comment:
   We add the new logic to `TransferLeadership`.  The new logic is to sends a TimeoutNow request to the target server.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1411547519

   Thanks @szetszwo for the review. I have a question about the `TransferLeadership` class.
   
   Currently a `TransferLeadership` object belongs to a `RaftServerImpl`.
   I'm wondering if we can change its scope to a `LeaderStateImpl` (in another Jira).
   Because it's only meaningful when the RaftServerImpl is a leader.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "szetszwo (via GitHub)" <gi...@apache.org>.
szetszwo commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1414335386

   > Currently a `TransferLeadership` object belongs to a `RaftServerImpl`.
   > I'm wondering if we can change its scope to a `LeaderStateImpl` (in another Jira).
   
   This sounds like a good idea.  Let's try refactoring the code in another JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on a diff in pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by GitBox <gi...@apache.org>.
kaijchen commented on code in PR #807:
URL: https://github.com/apache/ratis/pull/807#discussion_r1073034716


##########
ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java:
##########
@@ -1167,6 +1180,75 @@ public void onAppendEntriesReply(LogAppender appender, RaftProtos.AppendEntriesR
     readIndexHeartbeats.onAppendEntriesReply(appender, reply, this::hasMajority);
   }
 
+  @Override
+  public void onTransferLeadership(RaftPeerId transferee) {

Review Comment:
   The current `LeaderState#sendStartLeaderElectionToHigherPriorityPeer()` is actually `sendTimeoutNow()`.
   However, I think it should be refactored in another PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1414670056

   Thanks @szetszwo and @SzyWilliam for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on a diff in pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by "kaijchen (via GitHub)" <gi...@apache.org>.
kaijchen commented on code in PR #807:
URL: https://github.com/apache/ratis/pull/807#discussion_r1093251388


##########
ratis-server/src/main/java/org/apache/ratis/server/impl/TransferLeadership.java:
##########
@@ -81,11 +83,47 @@ public String toString() {
     this.server = server;
   }
 
+  private RaftPeerId getTransferee() {
+    return Optional.ofNullable(pending.get())
+        .map(r -> r.getRequest().getNewLeader()).orElse(null);
+  }
+
   boolean isSteppingDown() {
     return pending.get() != null;
   }
 
-  CompletableFuture<RaftClientReply> start(TransferLeadershipRequest request) {
+  void onFollowerAppendEntriesReply(LeaderStateImpl leaderState, FollowerInfo follower) {
+    final RaftPeerId transferee = getTransferee();
+    // If TransferLeadership is in progress, and the transferee has just append some entries
+    if (follower.getPeer().getId().equals(transferee)) {

Review Comment:
   Good catch. And I forgot to change `follower.getPeer().getId()` to `follower.getId()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] kaijchen commented on pull request #807: RATIS-1762. Support transfer leadership between nodes with same priority

Posted by GitBox <gi...@apache.org>.
kaijchen commented on PR #807:
URL: https://github.com/apache/ratis/pull/807#issuecomment-1385562211

   I think the current change is a good first step. @szetszwo @SzyWilliam PTAL, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org