You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by GitBox <gi...@apache.org> on 2022/03/21 11:46:30 UTC

[GitHub] [ratis] JacksonYao287 commented on a change in pull request #573: RATIS-1481. make state upgradate in notifyStateMachineToInstallSnapshot serialized

JacksonYao287 commented on a change in pull request #573:
URL: https://github.com/apache/ratis/pull/573#discussion_r830981515



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
##########
@@ -1650,7 +1649,7 @@ private InstallSnapshotReplyProto notifyStateMachineToInstallSnapshot(
       }
       changeToFollowerAndPersistMetadata(leaderTerm, "installSnapshot");
       state.setLeader(leaderId, "installSnapshot");
-      long snapshotIndex = state.getSnapshotIndex();
+      long snapshotIndex = state.getLog().getSnapshotIndex();

Review comment:
       maybe we can put this line before `if (snapshotIndex + 1 >= firstAvailableLogIndex && firstAvailableLogIndex > 0)`

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
##########
@@ -1651,7 +1651,7 @@ private InstallSnapshotReplyProto notifyStateMachineToInstallSnapshot(
       }
       changeToFollowerAndPersistMetadata(leaderTerm, "installSnapshot");
       state.setLeader(leaderId, "installSnapshot");
-      long snapshotIndex = state.getSnapshotIndex();
+      long snapshotIndex = state.getLog().getSnapshotIndex();

Review comment:
       > There is the possibility that after statemachine just installed the snapshot and before the main thread return the SNAPSHOT_INSTALLED, the snapshotIndex here will get from the statemachine instead of memory and return ALREADY_INSTALLED, which will cause the follower not to update the index and reload statemachine
   
   before the main thread return the SNAPSHOT_INSTALLED, `inProgressInstallSnapshotRequest` will not be set back to 0 if no exception is thrown. so the following code will not be executed until `inProgressInstallSnapshotRequest` is set back to 0. 
   ```
           if (snapshotIndex + 1 >= firstAvailableLogIndex && firstAvailableLogIndex > 0) {
             // State Machine has already installed the snapshot. Return the
             // latest snapshot index to the Leader.
   
             inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogIndex, 0);
             LOG.info("{}: InstallSnapshot notification result: {}, current snapshot index: {}", getMemberId(),
                 InstallSnapshotResult.ALREADY_INSTALLED, snapshotIndex);
             return ServerProtoUtils.toInstallSnapshotReplyProto(leaderId, getMemberId(), currentTerm,
                 InstallSnapshotResult.ALREADY_INSTALLED, snapshotIndex);
           }
   ```
   if `inProgressInstallSnapshotRequest` is set back to 0, it means the snapshot has been installed successfully,  so now ServerState snapshotIndex should be the same as RaftLog snapshotIndex because the statemachine has been reloaded. 
   so seems the changes here is unnecessary , please correct me if i miss something here!

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java
##########
@@ -1687,44 +1686,38 @@ private InstallSnapshotReplyProto notifyStateMachineToInstallSnapshot(
         // index. Notify the state machine to install the snapshot.
         LOG.info("{}: notifyInstallSnapshot: nextIndex is {} but the leader's first available index is {}.",
             getMemberId(), state.getLog().getNextIndex(), firstAvailableLogIndex);
-        try {
-          stateMachine.followerEvent().notifyInstallSnapshotFromLeader(roleInfoProto, firstAvailableLogTermIndex)
-              .whenComplete((reply, exception) -> {
-                if (exception != null) {
-                  LOG.warn("{}: Failed to notify StateMachine to InstallSnapshot. Exception: {}",
-                      getMemberId(), exception.getMessage());
-                  inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogIndex, 0);
-                  return;
-                }
 
-                if (reply != null) {
-                  LOG.info("{}: StateMachine successfully installed snapshot index {}. Reloading the StateMachine.",
-                      getMemberId(), reply.getIndex());
-                  stateMachine.pause();
-                  state.updateInstalledSnapshotIndex(reply);
-                  state.reloadStateMachine(reply.getIndex());
-                  installedSnapshotIndex.set(reply.getIndex());
-                } else {
-                  isSnapshotNull.set(true);
-                  if (LOG.isDebugEnabled()) {
-                    LOG.debug("{}: StateMachine could not install snapshot as it is not available", this);
-                  }
+        // If notifyInstallSnapshot future is done asynchronously, the main thread will go through the downside part.
+        // As the time consumed by statemachine is uncontrollable(e.g. the RocksDB checkpoint could be constantly
+        // increasing, the transmission will always exceed one boundary), we expect that once snapshot installed,
+        // follower could work ASAP. For the rest of time, server can respond snapshot installation in progress.
+
+        // There is another appendLog thread trying appending entries, which returns inconsistency entries with
+        // nextIndex and commitIndex to the leader when install snapshot in progress. The nextIndex is updated when
+        // state.reloadStateMachine. We shall keep this index upgrade synchronized with main thread, or else leader
+        // could get this follower's latest nextIndex from appendEntries, it will not send any notify install snapshot
+        // request, not receive SNAPSHOT_INSTALLED answer and then get stuck in the loop of sending appendEntries and
+        // receiving inconsistency entries.
+        stateMachine.followerEvent().notifyInstallSnapshotFromLeader(roleInfoProto, firstAvailableLogTermIndex)
+            .whenComplete((reply, exception) -> {
+              if (exception != null) {
+                LOG.error("{}: Failed to notify StateMachine to InstallSnapshot. Exception: {}",
+                    getMemberId(), exception.getMessage());
+                inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogIndex, 0);

Review comment:
       seems it is better to use `inProgressInstallSnapshotRequest.set(0)` if an exception is thrown, no need to compare!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org