You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Aravindan Vijayan (Jira)" <ji...@apache.org> on 2020/12/15 23:52:00 UTC

[jira] [Updated] (RATIS-1241) Leader unable to append logs to a recovering follower when its logs have been purged after a snapshot.

     [ https://issues.apache.org/jira/browse/RATIS-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aravindan Vijayan updated RATIS-1241:
-------------------------------------
    Summary: Leader unable to append logs to a recovering follower when its logs have been purged after a snapshot.  (was: Leader unable to append logs to a follower when its logs have been purged after a snapshot.)

> Leader unable to append logs to a recovering follower when its logs have been purged after a snapshot.
> ------------------------------------------------------------------------------------------------------
>
>                 Key: RATIS-1241
>                 URL: https://issues.apache.org/jira/browse/RATIS-1241
>             Project: Ratis
>          Issue Type: Bug
>            Reporter: Aravindan Vijayan
>            Assignee: Aravindan Vijayan
>            Priority: Critical
>
> *Steps to reproduce*
> * Setup a 3 node ratis group.
> * Write some transactions into the quorum.
> * Bring 1 peer down. 
> * Write more transactions into the quorum (other 2 nodes), take a snapshot at the last txn and purge logs from the remaining 2 nodes.
> * Start the node that was brought down.
> Leader falls into a possible irrecoverable state with respect to appending log entries to the follower.
> First time the follower comes back
> {code}
> 2020-12-15 15:42:01,069 [grpc-default-executor-5] WARN  server.GrpcLogAppender (GrpcLogAppender.java:resetClient(110)) - omNode-3@group-523986131536->omNode-2-GrpcLogAppender: Leader has not got in touch with Follower omNode-3@group-523986131536->omNode-2(c-1,m0,n409, attendVote=true, lastRpcSendTime=0, lastRpcResponseTime=8398) yet, just keep nextIndex unchanged and retry.
> 2020-12-15 15:42:01,570 [grpc-default-executor-5] INFO  server.RaftServer$Division (ServerState.java:setLeader(260)) - omNode-2@group-523986131536: change Leader from null to omNode-3 at term 3 for appendEntries, leader elected after 954ms
> 2020-12-15 15:42:01,570 [grpc-default-executor-5] INFO  server.RaftServer$Division (RaftServerImpl.java:checkInconsistentAppendEntries(1249)) - omNode-2@group-523986131536: Failed appendEntries as previous log entry ((t:3, i:408)) is not found
> 2020-12-15 15:42:01,571 [grpc-default-executor-5] INFO  server.RaftServer$Division (RaftServerImpl.java:checkInconsistentAppendEntries(1215)) - omNode-2@group-523986131536: inconsistency entries. Reply:omNode-3<-omNode-2#11187658:FAIL,INCONSISTENCY,nextIndex:205,term:2,followerCommit:203
> 2020-12-15 15:42:01,572 [grpc-default-executor-5] INFO  leader.FollowerInfo (FollowerInfoImpl.java:lambda$new$0(46)) - omNode-3@group-523986131536->omNode-2: nextIndex: updateUnconditionally 409 -> 205
> 2020-12-15 15:42:01,572 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) - omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index = 205
>         at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
>         at org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
>         at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
>         at org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
>         at org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,572 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  server.RaftServer$Division (LeaderStateImpl.java:restart(497)) - omNode-3@group-523986131536-LeaderStateImpl: Restarting GrpcLogAppender for omNode-3@group-523986131536->omNode-2
> {code}
> After the above, the leader falls into this error loop.
> {code}
> 2020-12-15 15:42:01,574 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) - omNode-3@group-523986131536->omNode-2-GrpcLogAp
> pender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index = 0
>         at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
>         at org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
>         at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
>         at org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
>         at org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,575 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  server.RaftServer$Division (LeaderStateImpl.java:restart(497)) - omNode-3@group-523986131536-LeaderStateImp
> l: Restarting GrpcLogAppender for omNode-3@group-523986131536->omNode-2
> 2020-12-15 15:42:01,575 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  metrics.RatisMetrics (RatisMetrics.java:unregister(43)) - Unregistering Metrics Registry : ratis_grpc.log_a
> ppender.omNode-3@group-523986131536
> 2020-12-15 15:42:01,575 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - raft.server.log.appender.snapshot.chunk.size.max
> = 16MB (=16777216) (default)
> 2020-12-15 15:42:01,575 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - raft.server.log.appender.buffer.byte-limit = 3355
> 4432 (custom)
> 2020-12-15 15:42:01,576 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - raft.server.log.appender.buffer.element-limit = 1
> 024 (custom)
> 2020-12-15 15:42:01,576 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  grpc.GrpcConfigKeys (ConfUtils.java:logGet(44)) - raft.grpc.server.leader.outstanding.appends.max = 128 (de
> fault)
> 2020-12-15 15:42:01,576 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - raft.server.rpc.request.timeout = 3000ms (default
> )
> 2020-12-15 15:42:01,576 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  server.RaftServerConfigKeys (ConfUtils.java:logGet(44)) - raft.server.log.appender.install.snapshot.enabled
>  = false (custom)
> 2020-12-15 15:42:01,576 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  metrics.RatisMetrics (RatisMetrics.java:lambda$create$0(36)) - Creating Metrics Registry : ratis_grpc.log_a
> ppender.omNode-3@group-523986131536
> 2020-12-15 15:42:01,577 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] ERROR leader.LogAppenderDaemon (LogAppenderDaemon.java:run(86)) - omNode-3@group-523986131536->omNode-2-GrpcLogAp
> pender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index = 0
>         at org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:283)
>         at org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:143)
>         at org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:210)
>         at org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:144)
>         at org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:77)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-12-15 15:42:01,578 [omNode-3@group-523986131536->omNode-2-GrpcLogAppender-LogAppenderDaemon] INFO  server.RaftServer$Division (LeaderStateImpl.java:restart(497)) - omNode-3@group-523986131536-LeaderStateImpl: Restarting GrpcLogAppender for omNode-3@group-523986131536->omNode-2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)