You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Shashikant Banerjee (Jira)" <ji...@apache.org> on 2020/01/15 16:43:00 UTC

[jira] [Created] (RATIS-794) Ratils leader should retry append requests based on follower commit info in case of intermittent append failures

Shashikant Banerjee created RATIS-794:
-----------------------------------------

             Summary: Ratils leader should retry append requests based on follower commit info in case of intermittent append failures
                 Key: RATIS-794
                 URL: https://issues.apache.org/jira/browse/RATIS-794
             Project: Ratis
          Issue Type: Bug
          Components: server
            Reporter: Shashikant Banerjee
            Assignee: Tsz-wo Sze
             Fix For: 0.5.0


During Ozone testing, it was observed that a leader election happens in between the test , where a follower has caught to a certain index 313. The new leader starts sends an append request to the follower which fails with grpc Exception. This leads to leader reset the connection and start from the beginning (index 1). 
 
 
{code:java}
2020-01-13 14:56:32,995 INFO org.apache.ratis.server.impl.RaftServerImpl: 0.0.0.0:9858@group-4F125BF42C14: changes role from CANDIDATE to LEADER at term 7 for changeToLeader
2020-01-13 14:56:32,995 INFO org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis: Leader change notification received for group: group-4F125BF42C14 with new leaderId: ed90869c-317e-4303-8922-9fa83a3983cb
2020-01-13 14:56:33,042 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 0.0.0.0:9858@group-4F125BF42C14->10.120.139.111:9858-AppendLogResponseHandler: Failed appendEntries: org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
2020-01-13 14:56:33,043 DEBUG org.apache.ratis.util.PeerProxyMap: ed90869c-317e-4303-8922-9fa83a3983cb: reset proxy for b65b0b6c-b0bb-429f-a23d-467c72d4b85c
2020-01-13 14:56:33,044 DEBUG org.apache.ratis.util.LifeCycle: b65b0b6c-b0bb-429f-a23d-467c72d4b85c:10.120.139.111:9858: RUNNING -> CLOSING
2020-01-13 14:56:33,044 DEBUG org.apache.ratis.util.LifeCycle: b65b0b6c-b0bb-429f-a23d-467c72d4b85c:10.120.139.111:9858: CLOSING -> CLOSED
2020-01-13 14:56:33,044 DEBUG org.apache.ratis.util.LifeCycle: b65b0b6c-b0bb-429f-a23d-467c72d4b85c:10.120.139.111:9858: NEW
2020-01-13 14:56:33,044 DEBUG org.apache.ratis.util.TimeoutScheduler: new ScheduledThreadPoolExecutor
2020-01-13 14:56:33,044 DEBUG org.apache.ratis.util.PeerProxyMap: ed90869c-317e-4303-8922-9fa83a3983cb: Closing proxy for peer b65b0b6c-b0bb-429f-a23d-467c72d4b85c:10.120.139.111:9858
2020-01-13 14:56:33,045 DEBUG org.apache.ratis.util.TimeoutScheduler: schedule a task: timeout 6000ms, sid 1 
2020-01-13 14:56:33,047 INFO org.apache.ratis.server.impl.FollowerInfo: 0.0.0.0:9858@group-4F125BF42C14->10.120.139.111:9858: nextIndex: updateUnconditionally 314 -> 1 ---------------------> set the next index for the follower back to 1 and  starts from 1)
2020-01-13 14:56:35,840 DEBUG org.apache.ratis.grpc.server.GrpcLogAppender: 0.0.0.0:9858@group-4F125BF42C14->10.120.139.111:9858-AppendLogResponseHandler: received the first reply ed90869c-317e-4303-8922-9fa83a3983cb<-b65b0b6c-b0bb-429f-a23d-467c72d4b85c#2:OK,SUCCESS,nextIndex:314,term:5,followerCommit:313, request=AppendEntriesRequest:cid=2,entriesCount=0,lastEntry=null .  -------------------> (Receives the response from follower indficating follower is at 312)
Although the follower is at 313, the leader keeps on sending the appendRequests from index 1. 
2020-01-13 14:56:35,841 DEBUG org.apache.ratis.server.impl.FollowerInfo: 0.0.0.0:9858@group-4F125BF42C14->10.120.139.111:9858: nextIndex: updateIncreasingly 1 -> 2
2020-01-13 14:56:35,841 DEBUG org.apache.ratis.util.TimeoutScheduler: schedule a task: timeout 6000ms, sid 7
2020-01-13 14:56:35,843 DEBUG org.apache.ratis.server.impl.FollowerInfo: 0.0.0.0:9858@group-4F125BF42C14->10.120.139.111:9858: nextIndex: updateIncreasingly 2 -> 3
2020-01-13 14:56:35,843 DEBUG org.apache.ratis.util.TimeoutScheduler: schedule a task: timeout 6000ms, sid 8
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)