You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by GitBox <gi...@apache.org> on 2022/03/09 03:45:27 UTC

[GitHub] [ratis] Xushaohong edited a comment on pull request #573: RATIS-1481. make state upgradate in notifyStateMachineToInstallSnapshot serialized

Xushaohong edited a comment on pull request #573:
URL: https://github.com/apache/ratis/pull/573#issuecomment-1062522977


   > What is the follower sends SNAPSHOT_INSTALLATION_IN_PROGRESS status instead of INCONSISTENCY status to the appendEntries request? 
   
   Sry, I do not get this since I have not modified the follower to send back this status.
   
   > IIUC, the issue is occurring because of this inconsistency reply to appendEntries request from Leader.
   
   Yep, this is nearly the root cause. 
   
   Precondition:
   There are two communicating threads between the leader and follower. One is to notify install snapshot, and the other is appendEntries thread(sending heartbeat and entries).
   The statemachine installing snapshot(lets say thread A) and return progress of installing snapshot(thread B) are asynchronous. The update of index(commit index and next index) is executed serially by the thread A. 
   In practical, the statemachine installing snapshot takes increasing time to finish. The main thread (lets say thread B) will response `IN_PROGRESS` during the install, but once the install is done, follower needs new request to return  `SNAPSHOT_INSTALLED`, since last request should have already replied IN_PROGRESS.
   
   
   If we keep the non-blocking style to let follower response to the leader with its instant status, there are two problems.
   
   Once snapshot is done, the updated index could be reached first by appendentries thread(lets say thread C), and C will return the updated nextIndex to the leader. The leader hence will misunderstand the status of the follower and not send the notification. The follower needs the new notification to send back SNAPSHOT_INSTALLED. The consequence is that thread B will get stuck, and thread C will wait for B done and always return INCONSISTENCY to the leader.
   	
   My solution: let update of index be executed by the main thread B, so the follower can ensure to send out the SNAPSHOT_INSTALLED after update of index.
   
   Currently, appendentries thread C return the `state.getNextIndex()` when snapshot installation in progress.  The nextIndex is calculated by Math.max(logNextIndex, snapshotNextIndex). Here snapshotNextIndex is got through server.getStateMachine().getLatestSnapshot(). We should return the updated index by thread B instead of statemachine’s.
   
   My solution: replace `snapshotNextIndex = getSnapshotIndex() + 1;` with `snapshotNextIndex = log.getSnapshotIndex() + 1;`
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org