You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ratis.apache.org by GitBox <gi...@apache.org> on 2020/12/04 07:11:36 UTC

[GitHub] [incubator-ratis] avijayanhwx opened a new pull request #322: RATIS-1195. Log Entry Validation in appendEntry can fail after log purge.

avijayanhwx opened a new pull request #322:
URL: https://github.com/apache/incubator-ratis/pull/322


   ## What changes were proposed in this pull request?
   After a log purge is done to the last log index, RaftLog#getLastEntryTermIndex will return a null. In RaftLog#validateLogEntry, when the last term index is null, the expectation is that the new entry to be appended is lastSnapshotIndex + 1. However, the 'lastSnapshotIndex' in RaftLog is updated only through installSnapshot calls. Hence, the validation fails and all further appends will fail until a restart.
   
   This patch uses a simple fallback in the RaftLog to ask the state machine for the snapshot index, when there is no logs available.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/RATIS-1195
   
   ## How was this patch tested?
   Manually tested.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] avijayanhwx commented on a change in pull request #322: RATIS-1195. Log Entry Validation in appendEntry can fail after log purge.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #322:
URL: https://github.com/apache/incubator-ratis/pull/322#discussion_r536830266



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/raftlog/RaftLog.java
##########
@@ -359,6 +365,9 @@ protected void validateLogEntry(LogEntryProto entry) {
               "is greater than 1, entry: %s",
           lastTermIndex.getIndex(), latestSnapshotIndex, entry);
     } else {
+      // No logs are present. Check state machine's snapshot index.
+      updateSnapshotIndex(snapshotIndexSupplier.getAsLong());
+      latestSnapshotIndex = getSnapshotIndex();

Review comment:
       A good point. But we don't expect it to be called for every validate call since it is in updated only in the case when there are no logs. That can happen only once after every purge. But, I agree we can move this to purge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo commented on a change in pull request #322: RATIS-1195. Log Entry Validation in appendEntry can fail after log purge.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #322:
URL: https://github.com/apache/incubator-ratis/pull/322#discussion_r536504800



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -110,22 +111,31 @@
         RaftServerConfigKeys.Log.corruptionPolicy(prop));
     snapshotManager = new SnapshotManager(storage, id);
 
-    long lastApplied = initStatemachine(stateMachine, group.getGroupId());
+    initStatemachine(stateMachine, group.getGroupId());
 
     // On start the leader is null, start the clock now
     leaderId = null;
     this.lastNoLeaderTime = Timestamp.currentTime();
     this.noLeaderTimeout = RaftServerConfigKeys.Notification.noLeaderTimeout(prop);
 
+    LongSupplier snapshotIndexSupplier = () -> {
+      SnapshotInfo snapshot = stateMachine.getLatestSnapshot();
+      if (snapshot == null || snapshot.getTermIndex().getIndex() < 0) {
+        return RaftLog.INVALID_LOG_INDEX;
+      }
+      return snapshot.getIndex();
+    };

Review comment:
       We may use Optional
   ```
       final LongSupplier getSnapshotIndexFromStateMachine = () -> Optional.ofNullable(stateMachine.getLatestSnapshot())
           .map(SnapshotInfo::getIndex)
           .filter(i -> i >= 0)
           .orElse(RaftLog.INVALID_LOG_INDEX);
   ```

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -182,17 +185,19 @@ void start() {
   }
 
   private static RaftLog initRaftLog(RaftGroupMemberId memberId, RaftServerImpl server, RaftStorage storage,
-      Consumer<LogEntryProto> logConsumer, long lastIndexInSnapshot, RaftProperties prop) throws IOException {
+      Consumer<LogEntryProto> logConsumer, LongSupplier snapshotIndexSupplier,
+      RaftProperties prop) throws IOException {
     final RaftLog log;
     if (RaftServerConfigKeys.Log.useMemory(prop)) {
-      log = new MemoryRaftLog(memberId, lastIndexInSnapshot, prop);
+      log = new MemoryRaftLog(memberId, snapshotIndexSupplier, prop);
     } else {
       log = new SegmentedRaftLog(memberId, server,
           server.getStateMachine(),
           server::notifyTruncatedLogEntry,
           server::submitUpdateCommitEvent,
-          storage, lastIndexInSnapshot, prop);
+          storage, snapshotIndexSupplier, prop);
     }
+    long lastIndexInSnapshot = snapshotIndexSupplier.getAsLong();
     log.open(lastIndexInSnapshot, logConsumer);

Review comment:
       snapshotIndexSupplier.getAsLong() is implemented by StateMachine so that it could be expensive.  Let's use log.getSnapshotIndex().
   ```
       log.open(log.getSnapshotIndex(), logConsumer);
   ```
   

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/raftlog/RaftLog.java
##########
@@ -84,19 +85,24 @@
   private final Runner runner = new Runner(this::getName);
   private final OpenCloseState state;
   private final RaftLogMetrics raftLogMetrics;
+  private final LongSupplier snapshotIndexSupplier;

Review comment:
       Since we already have snapshotIndex, let's rename this to
   ```
     private final LongSupplier getSnapshotIndexFromStateMachine;
   ```

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -110,22 +111,31 @@
         RaftServerConfigKeys.Log.corruptionPolicy(prop));
     snapshotManager = new SnapshotManager(storage, id);
 
-    long lastApplied = initStatemachine(stateMachine, group.getGroupId());
+    initStatemachine(stateMachine, group.getGroupId());
 
     // On start the leader is null, start the clock now
     leaderId = null;
     this.lastNoLeaderTime = Timestamp.currentTime();
     this.noLeaderTimeout = RaftServerConfigKeys.Notification.noLeaderTimeout(prop);
 
+    LongSupplier snapshotIndexSupplier = () -> {
+      SnapshotInfo snapshot = stateMachine.getLatestSnapshot();
+      if (snapshot == null || snapshot.getTermIndex().getIndex() < 0) {
+        return RaftLog.INVALID_LOG_INDEX;
+      }
+      return snapshot.getIndex();
+    };
+
     // we cannot apply log entries to the state machine in this step, since we
     // do not know whether the local log entries have been committed.
-    this.log = initRaftLog(getMemberId(), server, storage, this::setRaftConf, lastApplied, prop);
+    this.log = initRaftLog(getMemberId(), server, storage, this::setRaftConf, snapshotIndexSupplier, prop);
 
     RaftLog.Metadata metadata = log.loadMetadata();
     currentTerm.set(metadata.getTerm());
     votedFor = metadata.getVotedFor();
 
-    stateMachineUpdater = new StateMachineUpdater(stateMachine, server, this, lastApplied, prop);
+    stateMachineUpdater = new StateMachineUpdater(stateMachine, server, this,
+        snapshotIndexSupplier.getAsLong(), prop);

Review comment:
       snapshotIndexSupplier.getAsLong() is implemented by StateMachine so that it could be expensive.  Let's use log.getSnapshotIndex().
   ```
       stateMachineUpdater = new StateMachineUpdater(stateMachine, server, this, log.getSnapshotIndex(), prop);
   ```
   

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLog.java
##########
@@ -186,8 +187,9 @@ public void notifyTruncatedLogEntry(TermIndex ti) {
   @SuppressWarnings("parameternumber")
   public SegmentedRaftLog(RaftGroupMemberId memberId, RaftServer.Division server,
       StateMachine stateMachine, Consumer<LogEntryProto> notifyTruncatedLogEntry, Runnable submitUpdateCommitEvent,
-      RaftStorage storage, long lastIndexInSnapshot, RaftProperties properties) {
-    super(memberId, lastIndexInSnapshot, properties);
+      RaftStorage storage, LongSupplier snapshotIndexSupplier,
+                          RaftProperties properties) {

Review comment:
       Let's put this a single line?

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/raftlog/RaftLog.java
##########
@@ -359,6 +365,9 @@ protected void validateLogEntry(LogEntryProto entry) {
               "is greater than 1, entry: %s",
           lastTermIndex.getIndex(), latestSnapshotIndex, entry);
     } else {
+      // No logs are present. Check state machine's snapshot index.
+      updateSnapshotIndex(snapshotIndexSupplier.getAsLong());
+      latestSnapshotIndex = getSnapshotIndex();

Review comment:
       Updating in validateLogEntry is expensive since it is called for every entry.  How about updating it in purge?
   ```
   +++ b/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLog.java
   @@ -357,6 +357,7 @@ public class SegmentedRaftLog extends RaftLog {
      protected CompletableFuture<Long> purgeImpl(long index) {
        try (AutoCloseableLock writeLock = writeLock()) {
          SegmentedRaftLogCache.TruncationSegments ts = cache.purge(index);
   +      updateSnapshotIndexFromStateMachine();
          LOG.debug("purging segments:{}", ts);
          if (ts != null) {
            Task task = fileLogWorker.purge(ts);
   ```
   where updateSnapshotIndexFromStateMachine() is a new method in RaftLog.java
   ```
   +++ b/ratis-server/src/main/java/org/apache/ratis/server/raftlog/RaftLog.java
   @@ -153,6 +153,10 @@ public abstract class RaftLog implements RaftLogSequentialOps, Closeable {
        return false;
      }
    
   +  protected void updateSnapshotIndexFromStateMachine() {
   +    updateSnapshotIndex(getSnapshotIndexFromStateMachine.getAsLong());
   +  }
   +
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] avijayanhwx commented on pull request #322: RATIS-1195. Log Entry Validation in appendEntry can fail after log purge.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on pull request #322:
URL: https://github.com/apache/incubator-ratis/pull/322#issuecomment-740870618


   Thank you for the reviews @szetszwo & @mukul1987.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] avijayanhwx commented on a change in pull request #322: RATIS-1195. Log Entry Validation in appendEntry can fail after log purge.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on a change in pull request #322:
URL: https://github.com/apache/incubator-ratis/pull/322#discussion_r536830032



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/raftlog/RaftLog.java
##########
@@ -84,19 +85,24 @@
   private final Runner runner = new Runner(this::getName);
   private final OpenCloseState state;
   private final RaftLogMetrics raftLogMetrics;
+  private final LongSupplier snapshotIndexSupplier;

Review comment:
       Sure, will do.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] szetszwo merged pull request #322: RATIS-1195. Log Entry Validation in appendEntry can fail after log purge.

Posted by GitBox <gi...@apache.org>.
szetszwo merged pull request #322:
URL: https://github.com/apache/incubator-ratis/pull/322


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] mukul1987 commented on pull request #322: RATIS-1195. Log Entry Validation in appendEntry can fail after log purge.

Posted by GitBox <gi...@apache.org>.
mukul1987 commented on pull request #322:
URL: https://github.com/apache/incubator-ratis/pull/322#issuecomment-738796993


   The approach looks good to me. Also tagging @szetszwo here as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] avijayanhwx commented on pull request #322: RATIS-1195. Log Entry Validation in appendEntry can fail after log purge.

Posted by GitBox <gi...@apache.org>.
avijayanhwx commented on pull request #322:
URL: https://github.com/apache/incubator-ratis/pull/322#issuecomment-738611787


   @lokeshj1703, @hanishakoneru , @mukul1987 I have attached a preliminary patch to solve this problem. If the approach looks OK, I can polish it and add tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org