You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Tsz-wo Sze (Jira)" <ji...@apache.org> on 2021/04/26 09:21:00 UTC

[jira] [Resolved] (RATIS-1369) When there is no snapshot, the return snapshotIndex should be -1, not 0.

     [ https://issues.apache.org/jira/browse/RATIS-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz-wo Sze resolved RATIS-1369.
-------------------------------
    Fix Version/s: 2.1.0
       Resolution: Fixed

I have merged the pull request.  Thanks, [~glengeng]!

> When there is no snapshot, the return snapshotIndex should be -1, not 0.
> ------------------------------------------------------------------------
>
>                 Key: RATIS-1369
>                 URL: https://issues.apache.org/jira/browse/RATIS-1369
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>            Reporter: Glen Geng
>            Assignee: Glen Geng
>            Priority: Major
>             Fix For: 2.1.0
>
>         Attachments: ozone-root-scm-9.134.51.215.log - SCM2, follower.log
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> For now, if there is no snapshot existed, the returned snapshotIndex is 0.
> It is not correctly, since the raft log starts from index 0 (BTW, in raft paper, the first valid log index is 1, but in ratis, it is 0), a snapshot taken at snapshotIndex 0 should contain the log entry 0. The snapshotIndex indicating an empty/fake snapshot should be -1.
>  
> The fix is as follows: 
> {code:java}
> diff --git a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> index 728f7e9c..6307ca79 100644
> --- a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> +++ b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> @@ -431,7 +431,7 @@ class ServerState implements Closeable {
>  
>    long getLatestInstalledSnapshotIndex() {
>      final TermIndex ti = latestInstalledSnapshot.get();
> -    return ti != null? ti.getIndex(): 0L;
> +    return ti != null? ti.getIndex(): -1L;
>    }
>  
>    /**
> @@ -440,7 +440,7 @@ class ServerState implements Closeable {
>     */
>    long getSnapshotIndex() {
>      final SnapshotInfo s = getLatestSnapshot();
> -    final long latestSnapshotIndex = s != null ? s.getIndex() : 0;
> +    final long latestSnapshotIndex = s != null ? s.getIndex() : -1;
>      return Math.max(latestSnapshotIndex, getLatestInstalledSnapshotIndex());
>    }
> {code}
>  
> This issue is found in the test of bootstrap SCM in SCM HA. 
>  
> *TL; DR*
> *The issue can be reproduced by follow steps:*
> 1, use follow configuration for SCM HA
> {code:java}
>   <!-- scm snapshot -->
>   <property>
>     <name>ozone.scm.ha.ratis.snapshot.threshold</name>
>     <value>1</value>
>   </property>
>   <property>
>     <name>ozone.scm.ha.ratis.log.purge.gap</name>
>     <value>5</value>
>   </property>
>   <property>
>     <name>ozone.scm.ha.ratis.log.purge.enabled</name>
>     <value>true</value>
>   </property>
> {code}
> 2, int and start SCM1
> 3, start DN1, DN2, DN3
> 4, wait for a while, until there is dozens of raft log entries.
> 5, bootstrap SCM2
> 6, start SCM2, SCM1 will be stuck in appendEntries Timeout.
>  
>  *The RCA is:*
> after init SCM1, the raft log is 
> {code:java}
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls /tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
> log_inprogress_0  raft-meta  raft-meta.conf
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# 
> {code}
>  
> after SCM1 is started, and there is dozens of log entries generated, the raft log is
> {code:java}
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls /tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
> log_inprogress_1  raft-meta  raft-meta.conf
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# 
> {code}
> Since purge is enable and purge gap is 5, the file log_0_0 is removed. The leader does not have entry 0 any more.
>  
> Then SCM2 is bootstrapped and started, we expect SCM2 to download a snapshot from leader, since leader miss log entry 0, and SCM2 is empty.
> The follow SCM do receive an installSnapshot request,
> {code:java}
> 021-04-25 19:25:55,933 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2, i:1)
> {code}
> But due to the default snapshot index is 0, 
> {code:java}
> if (inProgressInstallSnapshotRequest.compareAndSet(null, firstAvailableLogTermIndex)) {
>   // Check if snapshot index is already at par or ahead of the first
>   // available log index of the Leader.
>   long snapshotIndex = state.getSnapshotIndex();
>   if (snapshotIndex + 1 >= firstAvailableLogIndex) {
>     // State Machine has already installed the snapshot. Return the
>     // latest snapshot index to the Leader.
>     inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogTermIndex, null);
>     final InstallSnapshotReplyProto reply = ServerProtoUtils.toInstallSnapshotReplyProto(
>         leaderId, getMemberId(), currentTerm, InstallSnapshotResult.ALREADY_INSTALLED, snapshotIndex);
>     LOG.info("{}: StateMachine snapshotIndex is {}", getMemberId(), snapshotIndex);
>     return reply;
>   }
> {code}
> 0(snapshotIndex) + 1 == 1(firstAvailableLogIndex), thus follower SCM ignore the  downloading.
> {code:java}
> 2021-04-25 19:25:55,933 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2, i:1)
> 2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM is still follower.
> 2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: change Leader from null to 630b3fbd-861f-4676-8e16-8ae548ed6658 at term 2 for installSnapshot, leader elected after 1416ms
> 2021-04-25 19:25:55,954 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: StateMachine snapshotIndex is 0
> 2021-04-25 19:25:56,019 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set new configuration index: 1
> configurationEntry {
>   peers {
>     id: "630b3fbd-861f-4676-8e16-8ae548ed6658"
>     address: "9.134.50.210:9865"
>   }
> }
>  from snapshot
> 2021-04-25 19:25:56,022 [grpc-default-executor-0] DEBUG org.apache.ratis.util.LifeCycle: 630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0: NEW
> 2021-04-25 19:25:56,022 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set configuration 1: [630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0], old=null
> 2021-04-25 19:25:56,026 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: reply installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> 2021-04-25 19:25:56,026 [grpc-default-executor-0] DEBUG org.apache.ratis.grpc.server.GrpcServerProtocolService: 55560a65-e7f7-46f0-b463-f511187fd358: reply 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> {code}
>  
> In the leader side, the matchIndex is updated to 0, and the nextIndex is updated to 1.
> {code:java}
> @Override
> public void setSnapshotIndex(long newSnapshotIndex) {
>   snapshotIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
>   matchIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
>   nextIndex.setUnconditionally(newSnapshotIndex + 1, infoIndexChange);
> }
> {code}
> {code:java}
> 2021-04-25 19:23:58,738 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcLogAppender: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler: received the first reply 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> 2021-04-25 19:23:58,740 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcLogAppender: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler: Already Installed Snapshot Index 0.
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: snapshotIndex: setUnconditionally 0 -> 0
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: matchIndex: setUnconditionally 0 -> 0
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: nextIndex: setUnconditionally 0 -> 1
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] DEBUG org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: commitIndex: updateToMax old=-1, new=0, updated? true
> {code}
>  
> Since matchIndex is monotonically increased, and the nextIndex will at least be matchIndex + 1, 
> {code:java}
> final long nextIndex = 1 + Optional.ofNullable(request)
>     .map(AppendEntriesRequest::getPreviousLog)
>     .map(TermIndex::getIndex)
>     .orElseGet(getFollower()::getMatchIndex);
> {code}
> Leader keep sending entries from index 1, but follower is empty, missing entry 0, the cluster stuck in inconsistency append entries.
>  
> With the fix, the download of checkpoint happened, and the system works well.
> {code:java}
> 2021-04-26 11:23:10,761 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: receive installSnapshot: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2, i:1)
> 2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM is still follower.
> 2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: change Leader from null to 6e7bb411-69c5-40ea-8238-bc65cc4e9b55 at term 2 for installSnapshot, leader elected after 1391ms
> 2021-04-26 11:23:10,787 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: notifyInstallSnapshot: nextIndex is 0 but the leader's first available index is 1.
> 2021-04-26 11:23:11,090 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: Received install snapshot notification from SCM leader: 9.134.50.210:9865 with term index: (t:2, i:1)
> 2021-04-26 11:23:11,091 [pool-13-thread-1] INFO org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and reloading state from the checkpoint.
> 2021-04-26 11:23:11,092 [grpc-default-executor-0] DEBUG org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: Snapshot Installation Request received and is in progress
> 2021-04-26 11:23:11,164 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set new configuration index: 1
> configurationEntry {
>   peers {
>     id: "6e7bb411-69c5-40ea-8238-bc65cc4e9b55"
>     address: "9.134.50.210:9865"
>   }
> }
>  from snapshot
> 2021-04-26 11:23:11,166 [grpc-default-executor-0] DEBUG org.apache.ratis.util.LifeCycle: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0: NEW
> 2021-04-26 11:23:11,167 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set configuration 1: [6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0], old=null
> 2021-04-26 11:23:11,171 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: reply installSnapshot: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS
> 2021-04-26 11:23:11,171 [grpc-default-executor-0] DEBUG org.apache.ratis.grpc.server.GrpcServerProtocolService: a0ba8aa4-5cff-4452-9adf-638135a9ebaf: reply 6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS
> 2021-04-26 11:23:11,175 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcServerProtocolService: a0ba8aa4-5cff-4452-9adf-638135a9ebaf: Completed INSTALL_SNAPSHOT, lastRequest: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2, i:1)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)