You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by GitBox <gi...@apache.org> on 2021/12/09 09:20:46 UTC

[GitHub] [ratis] Xushaohong opened a new pull request #560: RATIS-1446. Avoid leader election for invalid conf

Xushaohong opened a new pull request #560:
URL: https://github.com/apache/ratis/pull/560


   ## What changes were proposed in this pull request?
   Candidate shall not start leader election in these cases in case of possible NPE caused by conf.getPeer().getPriority().
   With this patch, the candidate will become a follower again when receiving the future appendEntries request.
   
    Phenomenon:
   The bootstrapped follower becomes the leader after the election timeout somehow, with an empty raft conf. Then in the process of yieldLeaderToHigherPriorityPeer, NPE happens as (null).getPriority(). 
   Setconfiguration action takes place at the end of appendEntriesAsync. If we make it possible to raise electiontimeout before this action, we can replay the NPE case.
   
   ```
   1private void yieldLeaderToHigherPriorityPeer() {
   2  if (!server.getInfo().isLeader()) {
   3    return;
   4  }
   5  final RaftConfigurationImpl conf = server.getRaftConf();
   6  int leaderPriority = conf.getPeer(server.getId()).getPriority(); 
   ```
   
   Possible reason:
   Network delay/loss? The leader did not send entries in time or the follower did not receive entries. 
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/RATIS-1446
   
   ## How was this patch tested?
   Manual simulation in UT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on a change in pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #560:
URL: https://github.com/apache/ratis/pull/560#discussion_r765703801



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
##########
@@ -583,6 +583,12 @@ synchronized void changeToCandidate(boolean forceStartLeaderElection) {
     if (state.shouldNotifyExtendedNoLeader()) {
       stateMachine.followerEvent().notifyExtendedNoLeader(getRoleInfoProto());
     }
+    // Candidate shall not start leader election in these cases in case of
+    // possible NPE caused by conf.getPeer().getPriority()
+    if (!getRaftConf().containsInBothConfs(getId())) {
+      LOG.warn("{} find invalid configuration {}, skip start leader election", this, getRaftConf());
+      return;
+    }

Review comment:
       The conf may be changed later on.  Therefore, let's don't check it here and check null in LeaderElection.getHigherPriorityPeers(..) as below.
   ```
     private Set<RaftPeerId> getHigherPriorityPeers(RaftConfiguration conf) {
       final Optional<Integer> priority = Optional.ofNullable(conf.getPeer(server.getId())).map(RaftPeer::getPriority);
       return conf.getAllPeers().stream()
           .filter(peer -> priority.filter(p -> peer.getPriority() > p).isPresent())
           .map(RaftPeer::getId)
           .collect(Collectors.toSet());
     }
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] Xushaohong commented on a change in pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on a change in pull request #560:
URL: https://github.com/apache/ratis/pull/560#discussion_r767672306



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderElection.java
##########
@@ -271,6 +272,9 @@ private boolean shouldRun(long electionTerm) {
 
   private ResultAndTerm submitRequestAndWaitResult(Phase phase, RaftConfigurationImpl conf, long electionTerm)
       throws InterruptedException {
+    if (!conf.containsInConf(server.getId()) && phase == Phase.ELECTION) {

Review comment:
       Could you help approve the workflow? : )




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-994685219


   > ... Shall we move up the line case NOT_IN_CONF: above case SHUTDOWN or still use my original phase check?
   
   This is a good idea.  Let's shutdown the server when it is not in conf.  Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on a change in pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #560:
URL: https://github.com/apache/ratis/pull/560#discussion_r767458817



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderElection.java
##########
@@ -271,6 +272,9 @@ private boolean shouldRun(long electionTerm) {
 
   private ResultAndTerm submitRequestAndWaitResult(Phase phase, RaftConfigurationImpl conf, long electionTerm)
       throws InterruptedException {
+    if (!conf.containsInConf(server.getId()) && phase == Phase.ELECTION) {

Review comment:
       We should fix the unit tests but not adding the phase check here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] Xushaohong commented on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-990766482


   Thanks for all the comments above. I have fixed all.
   
   With ```NOT_IN_CONF``` check added in ```submitRequestAndWaitResult```, RaftReconfigurationBaseTest would have some UT failed.  I checked it and found it clashes with PRE_VOTE and thus added the phase check. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo merged pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
szetszwo merged pull request #560:
URL: https://github.com/apache/ratis/pull/560


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on a change in pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #560:
URL: https://github.com/apache/ratis/pull/560#discussion_r768350389



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderElection.java
##########
@@ -271,6 +272,9 @@ private boolean shouldRun(long electionTerm) {
 
   private ResultAndTerm submitRequestAndWaitResult(Phase phase, RaftConfigurationImpl conf, long electionTerm)
       throws InterruptedException {
+    if (!conf.containsInConf(server.getId()) && phase == Phase.ELECTION) {

Review comment:
       @Xushaohong , the build finished.  Please take a look.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] Xushaohong edited a comment on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
Xushaohong edited a comment on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-990766482


   Thanks for all the comments above. I have fixed all. @szetszwo 
   
   With ```NOT_IN_CONF``` check added in ```submitRequestAndWaitResult```, RaftReconfigurationBaseTest would have some UT failed.  I checked it and found it clashes with PRE_VOTE and thus added the phase check. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] Xushaohong commented on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-992163704


   > @Xushaohong , thanks for the update.
   > 
   > TestLeaderElectionWithNetty.testTransferLeader failed. I also can reproduce it locally. Could you take a look?
   
   Currently, the setConf request would not update the ```FollowerInfo``` in ```LogAppenderBase```, so we have to get the latest followerInfo from conf like the previous version.  I have reverted it back and added an NPE check.
   @szetszwo  Could you help confirm it? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-992169519


   > With NOT_IN_CONF check added in submitRequestAndWaitResult, RaftReconfigurationBaseTest would have some UT failed. I checked it and found it clashes with PRE_VOTE and thus added the phase check.
   
   We should fix the RaftReconfigurationBaseTest but not having the phase check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] Xushaohong commented on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-989665984


   @szetszwo Actually, I am not sure whether we should make a specific check for each getPriority() use. Right now, I just prevent the NPE from the source. Currently, conf.getPeer().getPriority() only exists in leader election and leader state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] Xushaohong commented on a change in pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on a change in pull request #560:
URL: https://github.com/apache/ratis/pull/560#discussion_r767493994



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderElection.java
##########
@@ -271,6 +272,9 @@ private boolean shouldRun(long electionTerm) {
 
   private ResultAndTerm submitRequestAndWaitResult(Phase phase, RaftConfigurationImpl conf, long electionTerm)
       throws InterruptedException {
+    if (!conf.containsInConf(server.getId()) && phase == Phase.ELECTION) {

Review comment:
       > We should fix the unit tests but not adding the phase check here.
   
   OK, let's find out what's wrong with UT. @szetszwo 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] Xushaohong commented on a change in pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on a change in pull request #560:
URL: https://github.com/apache/ratis/pull/560#discussion_r767493994



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderElection.java
##########
@@ -271,6 +272,9 @@ private boolean shouldRun(long electionTerm) {
 
   private ResultAndTerm submitRequestAndWaitResult(Phase phase, RaftConfigurationImpl conf, long electionTerm)
       throws InterruptedException {
+    if (!conf.containsInConf(server.getId()) && phase == Phase.ELECTION) {

Review comment:
       > We should fix the unit tests but not adding the phase check here.
   
   OK, let's find out what's wrong with UT




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] Xushaohong commented on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
Xushaohong commented on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-994595809


   I reviewed the failure UT of testRemovePeers. The fault is due to after removing the peers out of the conf, the division which should quit the raft group will become a candidate and start the election. During the prevote process, originally it will send out the request and then the other peers should send back ```should shutdown```  to turn this server down. But for our case, we just return ```new ResultAndTerm(Result.NOT_IN_CONF)```, and won't go to the SHUTDOWN branch.
   @szetszwo Shall we move up the line ```case NOT_IN_CONF:``` above       ``` case SHUTDOWN ``` or still use my original phase check? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-989779930


   > ..., I am not sure whether we should make a specific check for each getPriority() use. 
   
   That's a good point.  We should check.  I just have walked through the code.  There are two other cases needed to be fixed:
   ```
   diff --git a/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java b/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java
   index 03850114..d1413040 100644
   --- a/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java
   +++ b/ratis-server/src/main/java/org/apache/ratis/server/impl/LeaderStateImpl.java
   @@ -916,13 +916,13 @@ class LeaderStateImpl implements LeaderState {
    
        for (LogAppender logAppender : senders.getSenders()) {
          FollowerInfo followerInfo = logAppender.getFollower();
   -      RaftPeerId followerID = followerInfo.getPeer().getId();
   -      int followerPriority = conf.getPeer(followerID).getPriority();
   -
   +      final RaftPeer follower = followerInfo.getPeer();
   +      final int followerPriority = follower.getPriority();
          if (followerPriority <= leaderPriority) {
            continue;
          }
    
   +      final RaftPeerId followerID = follower.getId();
          final TermIndex leaderLastEntry = server.getState().getLastEntry();
          if (leaderLastEntry == null) {
            LOG.info("{} send StartLeaderElectionRequest to follower:{} on term:{} because follower's priority:{} " +
   diff --git a/ratis-server/src/main/java/org/apache/ratis/server/impl/VoteContext.java b/ratis-server/src/main/java/org/apache/ratis/server/impl/VoteContext.java
   index 1ef721aa..fde198fe 100644
   --- a/ratis-server/src/main/java/org/apache/ratis/server/impl/VoteContext.java
   +++ b/ratis-server/src/main/java/org/apache/ratis/server/impl/VoteContext.java
   @@ -144,7 +144,11 @@ class VoteContext {
        }
    
        // Check priority
   -    final int priority = impl.getRaftConf().getPeer(impl.getId()).getPriority();
   +    final RaftPeer peer = conf.getPeer(impl.getId());
   +    if (peer == null) {
   +      return reject("our server " + impl.getId() + " is not in the conf");
   +    }
   +    final int priority = peer.getPriority();
        if (priority <= candidate.getPriority()) {
          return log(true, "our priority " + priority + " <= candidate's priority " + candidate.getPriority());
        } else {
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on pull request #560: RATIS-1446. Avoid leader election for invalid conf

Posted by GitBox <gi...@apache.org>.
szetszwo commented on pull request #560:
URL: https://github.com/apache/ratis/pull/560#issuecomment-991006870


   @Xushaohong , thanks for the update.  
   
   TestLeaderElectionWithNetty.testTransferLeader failed.  I also can reproduce it locally.  Could you take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org