You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by GitBox <gi...@apache.org> on 2023/01/02 14:03:26 UTC

[GitHub] [ratis] SzyWilliam opened a new pull request, #801: RATIS-1763. Purging logs in an ordered manner

SzyWilliam opened a new pull request, #801:
URL: https://github.com/apache/ratis/pull/801

   ## What changes were proposed in this pull request?
   
   We encountered `IllegalStateException` indicating a LogSegment missing as follows:
   ```java
   java.lang.IllegalStateException Found a gap between logs: the last log segment log-88826_88927 ended at 88927 but the next log segment log-89130_89071 started at 89130.
   	at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:72)
   	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.validateAdding(SegmentedRaftLogCache.java:421)
   	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.addSegment(SegmentedRaftLogCache.java:428)
   	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogCache.loadSegment(SegmentedRaftLogCache.java:381)
   	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:241)
   	at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.openImpl(SegmentedRaftLog.java:214)
   	at org.apache.ratis.server.raftlog.RaftLogBase.open(RaftLogBase.java:251)
   	at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:236)
   	at org.apache.ratis.server.impl.ServerState.initRaftLog(ServerState.java:217)
   	at org.apache.ratis.server.impl.ServerState.lambda$new$5(ServerState.java:160)
   	at org.apache.ratis.util.MemoizedSupplier.get(MemoizedSupplier.java:62)
   	at org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:174)
   	at org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:330)
   ```
   There were no manual operations involved with RaftLog. The RaftLog is manipulated purely by Ratis code.
   After careful investigation, we suspected that this is a bug triggered by **inappropriate log purging order**.
   
   ### Cause
   1. Current implementation picks the logs to purge in a reversed order, see [here](https://github.com/apache/ratis/blob/2b1b1b57f01dd629147ea1c956721520761e9126/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogCache.java#L321). 
   2. The purge task is executed asynchronously following the reversed order, one segment by one segment, see [here](https://github.com/apache/ratis/blob/2b1b1b57f01dd629147ea1c956721520761e9126/ratis-server/src/main/java/org/apache/ratis/server/raftlog/segmented/SegmentedRaftLogWorker.java#L472). 
   3. Say, if logs 0-8000 are selected to be purged by `StateMachineUpdater`, in the meanwhile there are more logs 8001-8010 being appended to RaftLog by `RaftLogWorker`. If we interrupt in the middle of the `PurgeLog` task and close RaftServer, there will be unfinished logs left to be purged, 1-4000 as a example. This will lead to a gap between logs(4000-8000 in this case), exactly as the exception stack indicates.
   
   ### Proposed solution
   There seems to be no obligations to purge logs in a reversed order. So I change it to the normal order.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/RATIS-1763
   
   ## How was this patch tested?
   
   unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] SzyWilliam commented on pull request #801: RATIS-1763. Purging logs in an ordered manner

Posted by GitBox <gi...@apache.org>.
SzyWilliam commented on PR #801:
URL: https://github.com/apache/ratis/pull/801#issuecomment-1369358194

   @szetszwo Thanks for reviewing this! It's much more precise to remove the first element. Made corresponding changes on code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo merged pull request #801: RATIS-1763. Purging logs in an ordered manner

Posted by GitBox <gi...@apache.org>.
szetszwo merged PR #801:
URL: https://github.com/apache/ratis/pull/801


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org