You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ratis.apache.org by GitBox <gi...@apache.org> on 2020/12/17 07:37:11 UTC

[GitHub] [incubator-ratis] avijayanhwx opened a new pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

avijayanhwx opened a new pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360


   ## What changes were proposed in this pull request?
   **Steps to reproduce issue**
   
   - Setup a 3 node ratis group.
   - Write some transactions into the quorum.
   - Bring 1 peer down.
   - Write more transactions into the quorum (other 2 nodes), take a snapshot at the last txn and purge logs from the remaining 2 nodes.
   - Start the node that was brought down.
   - Leader falls into a possible irrecoverable state with respect to appending log entries to the follower.
   
   Stack trace
   
   > 2020-12-15 15:42:01,572 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] > ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) - omNode-3@group-523986131536->omNode-2-> GrpcLogAppender-LogAppenderDaemon failed
   >org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index = 205
   >        at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
   >      at org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
   >        at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
   >        at org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
   >        at org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
   >        at java.lang.Thread.run(Thread.java:748)
   >2020-12-15 15:42:01,572 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  >server.RaftServer$Division (LeaderStateImpl.java:restart(497)) - omNode-3@group-523986131536-LeaderStateImpl: >Restarting GrpcLogAppender for omNode-3@group-523986131536->omNode-2
   
   **Fix**
   Add the appropriate check in GrpcLogAppender#shouldNotifyToInstallSnapshot to cover the case when the leader has no logs.
   
   ## What is the link to the Apache JIRA
   Manually tested on Ozone cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] avijayanhwx commented on a change in pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360#discussion_r545409263



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -568,6 +569,10 @@ private TermIndex shouldNotifyToInstallSnapshot() {
       // should be notified to install the latest snapshot through its
       // State Machine.
       return getRaftLog().getTermIndex(leaderStartIndex);
+    } else if (leaderStartIndex == RaftLog.INVALID_LOG_INDEX) {

Review comment:
       Yes, that is correct.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] avijayanhwx commented on a change in pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360#discussion_r545332107



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -568,6 +572,13 @@ private TermIndex shouldNotifyToInstallSnapshot() {
       // should be notified to install the latest snapshot through its
       // State Machine.
       return getRaftLog().getTermIndex(leaderStartIndex);
+    } else if (leaderStartIndex == RaftLog.INVALID_LOG_INDEX) {
+      // Leader has no logs to check from.
+      TermIndex snapshotTermIndex =
+          stateMachine.getLatestSnapshot().getTermIndex();
+      if (getFollower().getNextIndex() <= snapshotTermIndex.getIndex()) {
+        return snapshotTermIndex;
+      }

Review comment:
       That makes sense. Will test it out and update the patch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] avijayanhwx commented on a change in pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360#discussion_r545332291



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -64,6 +66,7 @@
   private volatile StreamObserver<AppendEntriesRequestProto> appendLogRequestObserver;
 
   private final GrpcServerMetrics grpcServerMetrics;
+  private final StateMachine stateMachine;

Review comment:
       Thanks for the review. Yes, will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo commented on a change in pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360#discussion_r544909361



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -568,6 +572,13 @@ private TermIndex shouldNotifyToInstallSnapshot() {
       // should be notified to install the latest snapshot through its
       // State Machine.
       return getRaftLog().getTermIndex(leaderStartIndex);
+    } else if (leaderStartIndex == RaftLog.INVALID_LOG_INDEX) {
+      // Leader has no logs to check from.
+      TermIndex snapshotTermIndex =
+          stateMachine.getLatestSnapshot().getTermIndex();
+      if (getFollower().getNextIndex() <= snapshotTermIndex.getIndex()) {
+        return snapshotTermIndex;
+      }

Review comment:
       Since this is **Notify**ToInstallSnapshot, it does not matter if the leader has a snapshot.  Just return current term and next index.
   ```
       } else if (leaderStartIndex == RaftLog.INVALID_LOG_INDEX) {
         return TermIndex.valueOf(getServer().getInfo().getCurrentTerm(), getRaftLog().getNextIndex());
       }
   ```

##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -64,6 +66,7 @@
   private volatile StreamObserver<AppendEntriesRequestProto> appendLogRequestObserver;
 
   private final GrpcServerMetrics grpcServerMetrics;
+  private final StateMachine stateMachine;

Review comment:
       We may use getServer().getStateMachine().  Indeed StateMachine is not needed; see the other comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia commented on a change in pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360#discussion_r545355909



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -568,6 +569,10 @@ private TermIndex shouldNotifyToInstallSnapshot() {
       // should be notified to install the latest snapshot through its
       // State Machine.
       return getRaftLog().getTermIndex(leaderStartIndex);
+    } else if (leaderStartIndex == RaftLog.INVALID_LOG_INDEX) {

Review comment:
       Just out of curiosity: 
   
   so leaderStartIndex == RaftLog.INVALID_LOG_INDEX means the leader has no log at all (all have been purged) thus there is on valid start index?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia commented on a change in pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360#discussion_r545355909



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -568,6 +569,10 @@ private TermIndex shouldNotifyToInstallSnapshot() {
       // should be notified to install the latest snapshot through its
       // State Machine.
       return getRaftLog().getTermIndex(leaderStartIndex);
+    } else if (leaderStartIndex == RaftLog.INVALID_LOG_INDEX) {

Review comment:
       Just out of curiosity: 
   
   so leaderStartIndex == RaftLog.INVALID_LOG_INDEX means the leader has no log at all (all have been purged) thus there is no valid start index?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] avijayanhwx commented on pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360#issuecomment-747269085


   cc @szetszwo / @hanishakoneru / @lokeshj1703 for review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] amaliujia commented on a change in pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
amaliujia commented on a change in pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360#discussion_r545428318



##########
File path: ratis-grpc/src/main/java/org/apache/ratis/grpc/server/GrpcLogAppender.java
##########
@@ -568,6 +569,10 @@ private TermIndex shouldNotifyToInstallSnapshot() {
       // should be notified to install the latest snapshot through its
       // State Machine.
       return getRaftLog().getTermIndex(leaderStartIndex);
+    } else if (leaderStartIndex == RaftLog.INVALID_LOG_INDEX) {

Review comment:
       Thanks for catching this case!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo merged pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
szetszwo merged pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo closed pull request #360: RATIS-1241. Leader unable to append logs to a recovering follower when its logs have been purged.

Posted by GitBox <gi...@apache.org>.
szetszwo closed pull request #360:
URL: https://github.com/apache/incubator-ratis/pull/360


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org