You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Tsz-wo Sze (Jira)" <ji...@apache.org> on 2021/04/26 09:21:00 UTC
[jira] [Resolved] (RATIS-1369) When there is no snapshot, the
return snapshotIndex should be -1, not 0.
[ https://issues.apache.org/jira/browse/RATIS-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz-wo Sze resolved RATIS-1369.
-------------------------------
Fix Version/s: 2.1.0
Resolution: Fixed
I have merged the pull request. Thanks, [~glengeng]!
> When there is no snapshot, the return snapshotIndex should be -1, not 0.
> ------------------------------------------------------------------------
>
> Key: RATIS-1369
> URL: https://issues.apache.org/jira/browse/RATIS-1369
> Project: Ratis
> Issue Type: Bug
> Components: server
> Reporter: Glen Geng
> Assignee: Glen Geng
> Priority: Major
> Fix For: 2.1.0
>
> Attachments: ozone-root-scm-9.134.51.215.log - SCM2, follower.log
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> For now, if there is no snapshot existed, the returned snapshotIndex is 0.
> It is not correctly, since the raft log starts from index 0 (BTW, in raft paper, the first valid log index is 1, but in ratis, it is 0), a snapshot taken at snapshotIndex 0 should contain the log entry 0. The snapshotIndex indicating an empty/fake snapshot should be -1.
>
> The fix is as follows:
> {code:java}
> diff --git a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> index 728f7e9c..6307ca79 100644
> --- a/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> +++ b/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
> @@ -431,7 +431,7 @@ class ServerState implements Closeable {
>
> long getLatestInstalledSnapshotIndex() {
> final TermIndex ti = latestInstalledSnapshot.get();
> - return ti != null? ti.getIndex(): 0L;
> + return ti != null? ti.getIndex(): -1L;
> }
>
> /**
> @@ -440,7 +440,7 @@ class ServerState implements Closeable {
> */
> long getSnapshotIndex() {
> final SnapshotInfo s = getLatestSnapshot();
> - final long latestSnapshotIndex = s != null ? s.getIndex() : 0;
> + final long latestSnapshotIndex = s != null ? s.getIndex() : -1;
> return Math.max(latestSnapshotIndex, getLatestInstalledSnapshotIndex());
> }
> {code}
>
> This issue is found in the test of bootstrap SCM in SCM HA.
>
> *TL; DR*
> *The issue can be reproduced by follow steps:*
> 1, use follow configuration for SCM HA
> {code:java}
> <!-- scm snapshot -->
> <property>
> <name>ozone.scm.ha.ratis.snapshot.threshold</name>
> <value>1</value>
> </property>
> <property>
> <name>ozone.scm.ha.ratis.log.purge.gap</name>
> <value>5</value>
> </property>
> <property>
> <name>ozone.scm.ha.ratis.log.purge.enabled</name>
> <value>true</value>
> </property>
> {code}
> 2, int and start SCM1
> 3, start DN1, DN2, DN3
> 4, wait for a while, until there is dozens of raft log entries.
> 5, bootstrap SCM2
> 6, start SCM2, SCM1 will be stuck in appendEntries Timeout.
>
> *The RCA is:*
> after init SCM1, the raft log is
> {code:java}
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls /tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
> log_inprogress_0 raft-meta raft-meta.conf
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]#
> {code}
>
> after SCM1 is started, and there is dozens of log entries generated, the raft log is
> {code:java}
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]# ls /tmp/metadata/scm-ha/b7a11f50-85b8-4b93-8a6b-c00705b4e3d4/current/
> log_inprogress_1 raft-meta raft-meta.conf
> [root@9 ~/glengeng/ozone-1.2.0-SNAPSHOT]#
> {code}
> Since purge is enable and purge gap is 5, the file log_0_0 is removed. The leader does not have entry 0 any more.
>
> Then SCM2 is bootstrapped and started, we expect SCM2 to download a snapshot from leader, since leader miss log entry 0, and SCM2 is empty.
> The follow SCM do receive an installSnapshot request,
> {code:java}
> 021-04-25 19:25:55,933 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2, i:1)
> {code}
> But due to the default snapshot index is 0,
> {code:java}
> if (inProgressInstallSnapshotRequest.compareAndSet(null, firstAvailableLogTermIndex)) {
> // Check if snapshot index is already at par or ahead of the first
> // available log index of the Leader.
> long snapshotIndex = state.getSnapshotIndex();
> if (snapshotIndex + 1 >= firstAvailableLogIndex) {
> // State Machine has already installed the snapshot. Return the
> // latest snapshot index to the Leader.
> inProgressInstallSnapshotRequest.compareAndSet(firstAvailableLogTermIndex, null);
> final InstallSnapshotReplyProto reply = ServerProtoUtils.toInstallSnapshotReplyProto(
> leaderId, getMemberId(), currentTerm, InstallSnapshotResult.ALREADY_INSTALLED, snapshotIndex);
> LOG.info("{}: StateMachine snapshotIndex is {}", getMemberId(), snapshotIndex);
> return reply;
> }
> {code}
> 0(snapshotIndex) + 1 == 1(firstAvailableLogIndex), thus follower SCM ignore the downloading.
> {code:java}
> 2021-04-25 19:25:55,933 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: receive installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658->55560a65-e7f7-46f0-b463-f511187fd358#0-t2,notify:(t:2, i:1)
> 2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM is still follower.
> 2021-04-25 19:25:55,940 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: change Leader from null to 630b3fbd-861f-4676-8e16-8ae548ed6658 at term 2 for installSnapshot, leader elected after 1416ms
> 2021-04-25 19:25:55,954 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: StateMachine snapshotIndex is 0
> 2021-04-25 19:25:56,019 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set new configuration index: 1
> configurationEntry {
> peers {
> id: "630b3fbd-861f-4676-8e16-8ae548ed6658"
> address: "9.134.50.210:9865"
> }
> }
> from snapshot
> 2021-04-25 19:25:56,022 [grpc-default-executor-0] DEBUG org.apache.ratis.util.LifeCycle: 630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0: NEW
> 2021-04-25 19:25:56,022 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: set configuration 1: [630b3fbd-861f-4676-8e16-8ae548ed6658|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0], old=null
> 2021-04-25 19:25:56,026 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: 55560a65-e7f7-46f0-b463-f511187fd358@group-7587E602AEC2: reply installSnapshot: 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> 2021-04-25 19:25:56,026 [grpc-default-executor-0] DEBUG org.apache.ratis.grpc.server.GrpcServerProtocolService: 55560a65-e7f7-46f0-b463-f511187fd358: reply 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> {code}
>
> In the leader side, the matchIndex is updated to 0, and the nextIndex is updated to 1.
> {code:java}
> @Override
> public void setSnapshotIndex(long newSnapshotIndex) {
> snapshotIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
> matchIndex.setUnconditionally(newSnapshotIndex, infoIndexChange);
> nextIndex.setUnconditionally(newSnapshotIndex + 1, infoIndexChange);
> }
> {code}
> {code:java}
> 2021-04-25 19:23:58,738 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcLogAppender: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler: received the first reply 630b3fbd-861f-4676-8e16-8ae548ed6658<-55560a65-e7f7-46f0-b463-f511187fd358#0:FAIL-t0,ALREADY_INSTALLED
> 2021-04-25 19:23:58,740 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcLogAppender: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358-InstallSnapshotResponseHandler: Already Installed Snapshot Index 0.
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: snapshotIndex: setUnconditionally 0 -> 0
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: matchIndex: setUnconditionally 0 -> 0
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] INFO org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: nextIndex: setUnconditionally 0 -> 1
> 2021-04-25 19:23:58,741 [grpc-default-executor-0] DEBUG org.apache.ratis.server.leader.FollowerInfo: 630b3fbd-861f-4676-8e16-8ae548ed6658@group-7587E602AEC2->55560a65-e7f7-46f0-b463-f511187fd358: commitIndex: updateToMax old=-1, new=0, updated? true
> {code}
>
> Since matchIndex is monotonically increased, and the nextIndex will at least be matchIndex + 1,
> {code:java}
> final long nextIndex = 1 + Optional.ofNullable(request)
> .map(AppendEntriesRequest::getPreviousLog)
> .map(TermIndex::getIndex)
> .orElseGet(getFollower()::getMatchIndex);
> {code}
> Leader keep sending entries from index 1, but follower is empty, missing entry 0, the cluster stuck in inconsistency append entries.
>
> With the fix, the download of checkpoint happened, and the system works well.
> {code:java}
> 2021-04-26 11:23:10,761 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: receive installSnapshot: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2, i:1)
> 2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: leader changed, yet current SCM is still follower.
> 2021-04-26 11:23:10,768 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: change Leader from null to 6e7bb411-69c5-40ea-8238-bc65cc4e9b55 at term 2 for installSnapshot, leader elected after 1391ms
> 2021-04-26 11:23:10,787 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: notifyInstallSnapshot: nextIndex is 0 but the leader's first available index is 1.
> 2021-04-26 11:23:11,090 [grpc-default-executor-0] INFO org.apache.hadoop.hdds.scm.ha.SCMStateMachine: Received install snapshot notification from SCM leader: 9.134.50.210:9865 with term index: (t:2, i:1)
> 2021-04-26 11:23:11,091 [pool-13-thread-1] INFO org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and reloading state from the checkpoint.
> 2021-04-26 11:23:11,092 [grpc-default-executor-0] DEBUG org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: Snapshot Installation Request received and is in progress
> 2021-04-26 11:23:11,164 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set new configuration index: 1
> configurationEntry {
> peers {
> id: "6e7bb411-69c5-40ea-8238-bc65cc4e9b55"
> address: "9.134.50.210:9865"
> }
> }
> from snapshot
> 2021-04-26 11:23:11,166 [grpc-default-executor-0] DEBUG org.apache.ratis.util.LifeCycle: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0: NEW
> 2021-04-26 11:23:11,167 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: set configuration 1: [6e7bb411-69c5-40ea-8238-bc65cc4e9b55|rpc:9.134.50.210:9865|admin:|client:|dataStream:|priority:0], old=null
> 2021-04-26 11:23:11,171 [grpc-default-executor-0] INFO org.apache.ratis.server.RaftServer$Division: a0ba8aa4-5cff-4452-9adf-638135a9ebaf@group-C00705B4E3D4: reply installSnapshot: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS
> 2021-04-26 11:23:11,171 [grpc-default-executor-0] DEBUG org.apache.ratis.grpc.server.GrpcServerProtocolService: a0ba8aa4-5cff-4452-9adf-638135a9ebaf: reply 6e7bb411-69c5-40ea-8238-bc65cc4e9b55<-a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0:FAIL-t0,IN_PROGRESS
> 2021-04-26 11:23:11,175 [grpc-default-executor-0] INFO org.apache.ratis.grpc.server.GrpcServerProtocolService: a0ba8aa4-5cff-4452-9adf-638135a9ebaf: Completed INSTALL_SNAPSHOT, lastRequest: 6e7bb411-69c5-40ea-8238-bc65cc4e9b55->a0ba8aa4-5cff-4452-9adf-638135a9ebaf#0-t2,notify:(t:2, i:1)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)