You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ratis.apache.org by GitBox <gi...@apache.org> on 2020/06/24 08:15:25 UTC

[GitHub] [incubator-ratis] runzhiwang opened a new pull request #137: RATIS-987. Infinite install snapshot

runzhiwang opened a new pull request #137:
URL: https://github.com/apache/incubator-ratis/pull/137


   ## What changes were proposed in this pull request?
   
   **What's the problem ?**
   1. This happens in ozone production with ratis-0.5.0
   2. leader notify follower install snapshot-(t:3, i:999697) infinitely
   ![image](https://user-images.githubusercontent.com/51938049/85519402-06fcb200-b634-11ea-9d18-37037e4a7403.png)
   
   3.  follower install snapshot but log `StateMachine installSnapshot is in progress` infinitely.
   ![image](https://user-images.githubusercontent.com/51938049/85519573-47f4c680-b634-11ea-8cb8-70ddb2c66b40.png)
   
   What's the reason ?
   1.  The log code of [StateMachine installSnapshot is in progress](https://github.com/apache/incubator-ratis/blob/ratis-0.5.0-rc0/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1198) in ratis-0.5.0 as follow.
   This log will be printed if [inProgressInstallSnapshotRequest.compareAndSet(null, firstAvailableLogTermIndex)](https://github.com/apache/incubator-ratis/blob/ratis-0.5.0-rc0/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1157) return false, i.e. inProgressInstallSnapshotRequest != null. And the reason of inProgressInstallSnapshotRequest != null, is [stateMachine.notifyInstallSnapshotFromLeader](https://github.com/apache/incubator-ratis/blob/ratis-0.5.0-rc0/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1178) throw exception in [ozone](https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java#L569).
   If throw exception in notifyInstallSnapshotFromLeader, inProgressInstallSnapshotRequest can not will set null forever. Then infinite install snapshot happens.
   ```
   if (inProgressInstallSnapshotRequest.compareAndSet(null, firstAvailableLogTermIndex)) {
   
           ...
   
           stateMachine.notifyInstallSnapshotFromLeader(getRoleInfoProto(), firstAvailableLogTermIndex)
               .whenComplete((reply, exception) -> {
                 if (exception != null) {
                   LOG.error("{}: State Machine failed to install snapshot", getMemberId(), exception);
                   inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogTermIndex, null);
                   return;
                 }
   
                 if (reply != null) {
                   stateMachine.pause();
                   state.reloadStateMachine(reply.getIndex(), leaderTerm);
                   state.updateInstalledSnapshotIndex(reply);
                 }
                 inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogTermIndex, null);
               });
   
       return ServerProtoUtils.toInstallSnapshotReplyProto(leaderId, getMemberId(),
               currentTerm, InstallSnapshotResult.SUCCESS, -1);
   }
   
   LOG.info("{}: StateMachine installSnapshot is in progress: {}",
             getMemberId(), inProgressInstallSnapshotRequest.get());
   ```
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/RATIS-987
   
   ## How was this patch tested?
   
   Existed tests.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] runzhiwang commented on pull request #137: RATIS-987. Infinite install snapshot

Posted by GitBox <gi...@apache.org>.
runzhiwang commented on pull request #137:
URL: https://github.com/apache/incubator-ratis/pull/137#issuecomment-648670922


   @bshashikant @lokeshj1703 Could you help review it? Thank you very much.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-ratis] bshashikant merged pull request #137: RATIS-987. Fix Infinite install snapshot

Posted by GitBox <gi...@apache.org>.
bshashikant merged pull request #137:
URL: https://github.com/apache/incubator-ratis/pull/137


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org