You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2015/09/04 02:41:47 UTC

[jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail

    [ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730109#comment-14730109 ] 

Jun Rao commented on KAFKA-2143:
--------------------------------

[~becket_qin], since before step (3), both A and B are in ISR, the last committed offset in A can't be larger than 3000. So, in step (3), if A becomes a follower, it has to first truncate its log to last committed offset before fetching. So, at that point, A's fetch offset can't be larger than 3000 and therefore won't be out of range.

The following is a alternative scenario that can cause this.

1) Broker A (leader) receives messages to 5000
2) Broker B (follower) receives messages to 3000 (it is still in ISR because of replica.lag.max.messages)
3) For some reason, B is dropped out of ISR.
4) Broker A (the only one in ISR) commits messages to 5000.
5) For some reason, Broker A is considered dead and Broker B is live.
6) Broker B is selected as the new leader (unclean leader election) and is the only one in ISR.
7) Broker A is considered live again and starts fetching from 5000 (last committed offset) and gets OffsetOutOfRangeException.
8) In the mean time, B receives more messages to offset 6000.
9) Broker A tries to handle OffsetOutOfRangeException and finds out leader B's log end offset is now larger than its log end offset and truncates all its log.

Your patch reduces the amount of the data that Broker A needs to replicate in step 9, which is probably fine. However, we probably should first verify if this is indeed what's happening since it seems that it should happen rarely. Also, KAFKA-2477 reports a similar issue w/o any leadership change. So, may be there is something else that can cause this.

> Replicas get ahead of leader and fail
> -------------------------------------
>
>                 Key: KAFKA-2143
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2143
>             Project: Kafka
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 0.8.2.1
>            Reporter: Evan Huus
>            Assignee: Jiangjie Qin
>
> On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own,
> Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines:
> On node 3 (the leader):
> {noformat}
> [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager)
> [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager)
> [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition)
> {noformat}
> Note that both replicas suddenly asked for an offset *ahead* of the available offsets.
> And on nodes 1 and 4 (the replicas) many occurrences of the following:
> {noformat}
> [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited)
> {noformat}
> Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4).
> Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

RE: [jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail

Posted by chenlax <la...@hotmail.com>.

i meet the same issues,the error log,
Error when processing fetch request for partition [To_S3_comm_V0_10,2]
offset 456234794 from follower with correlation id 254117341. Possible cause: Request for offset 456234794 but we only have log segmen
ts in the range 432322850 to 456234793. (kafka.server.ReplicaManager)

and i find in the handleOffsetOutOfRange,it just check (leaderEndOffset < replica.logEndOffset.messageOffset),if not, will delete the all log

INFO Scheduling log segment 432322850 for log To_S3_comm_V0_10-2 for deletion. (kafka.log.Log)
...........................
INFO Deleting segment 434379909 from log To_S3_comm_V0_10-2. (kafka.log.Log)

i think,it must add check  (log.logEndOffset < leaderStartOffset) when leaderEndOffset not small than replica.logEndOffset.messageOffset.


Thanks，
Lax


> Date: Fri, 4 Sep 2015 00:41:47 +0000
> From: jira@apache.org
> To: dev@kafka.apache.org
> Subject: [jira] [Commented] (KAFKA-2143) Replicas get ahead of leader and fail
> 
> 
>     [ https://issues.apache.org/jira/browse/KAFKA-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730109#comment-14730109 ] 
> 
> Jun Rao commented on KAFKA-2143:
> --------------------------------
> 
> [~becket_qin], since before step (3), both A and B are in ISR, the last committed offset in A can't be larger than 3000. So, in step (3), if A becomes a follower, it has to first truncate its log to last committed offset before fetching. So, at that point, A's fetch offset can't be larger than 3000 and therefore won't be out of range.
> 
> The following is a alternative scenario that can cause this.
> 
> 1) Broker A (leader) receives messages to 5000
> 2) Broker B (follower) receives messages to 3000 (it is still in ISR because of replica.lag.max.messages)
> 3) For some reason, B is dropped out of ISR.
> 4) Broker A (the only one in ISR) commits messages to 5000.
> 5) For some reason, Broker A is considered dead and Broker B is live.
> 6) Broker B is selected as the new leader (unclean leader election) and is the only one in ISR.
> 7) Broker A is considered live again and starts fetching from 5000 (last committed offset) and gets OffsetOutOfRangeException.
> 8) In the mean time, B receives more messages to offset 6000.
> 9) Broker A tries to handle OffsetOutOfRangeException and finds out leader B's log end offset is now larger than its log end offset and truncates all its log.
> 
> Your patch reduces the amount of the data that Broker A needs to replicate in step 9, which is probably fine. However, we probably should first verify if this is indeed what's happening since it seems that it should happen rarely. Also, KAFKA-2477 reports a similar issue w/o any leadership change. So, may be there is something else that can cause this.
> 
> > Replicas get ahead of leader and fail
> > -------------------------------------
> >
> >                 Key: KAFKA-2143
> >                 URL: https://issues.apache.org/jira/browse/KAFKA-2143
> >             Project: Kafka
> >          Issue Type: Bug
> >          Components: replication
> >    Affects Versions: 0.8.2.1
> >            Reporter: Evan Huus
> >            Assignee: Jiangjie Qin
> >
> > On a cluster of 6 nodes, we recently saw a case where a single under-replicated partition suddenly appeared, replication lag spiked, and network IO spiked. The cluster appeared to recover eventually on its own,
> > Looking at the logs, the thing which failed was partition 7 of the topic {{background_queue}}. It had an ISR of 1,4,3 and its leader at the time was 3. Here are the interesting log lines:
> > On node 3 (the leader):
> > {noformat}
> > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 148185816. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager)
> > [2015-04-23 16:50:05,879] ERROR [Replica Manager on Broker 3]: Error when processing fetch request for partition [background_queue,7] offset 3722949957 from follower with correlation id 156007054. Possible cause: Request for offset 3722949957 but we only have log segments in the range 3648049863 to 3722949955. (kafka.server.ReplicaManager)
> > [2015-04-23 16:50:13,960] INFO Partition [background_queue,7] on broker 3: Shrinking ISR for partition [background_queue,7] from 1,4,3 to 3 (kafka.cluster.Partition)
> > {noformat}
> > Note that both replicas suddenly asked for an offset *ahead* of the available offsets.
> > And on nodes 1 and 4 (the replicas) many occurrences of the following:
> > {noformat}
> > [2015-04-23 16:50:05,935] INFO Scheduling log segment 3648049863 for log background_queue-7 for deletion. (kafka.log.Log) (edited)
> > {noformat}
> > Based on my reading, this looks like the replicas somehow got *ahead* of the leader, asked for an invalid offset, got confused, and re-replicated the entire topic from scratch to recover (this matches our network graphs, which show 3 sending a bunch of data to 1 and 4).
> > Taking a stab in the dark at the cause, there appears to be a race condition where replicas can receive a new offset before the leader has committed it and is ready to replicate?
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)