You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Zhang Jianguo (Jira)" <ji...@apache.org> on 2021/03/19 02:43:00 UTC

[jira] [Comment Edited] (KAFKA-8608) Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation

    [ https://issues.apache.org/jira/browse/KAFKA-8608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304601#comment-17304601 ] 

Zhang Jianguo edited comment on KAFKA-8608 at 3/19/21, 2:42 AM:
----------------------------------------------------------------

[~LillianY]

I meet the same issue.

[2021-03-18 17:37:30,799] WARN [ReplicaManager broker=15] Leader 15 failed to record follower 18's position 308399, and last sent HW since the replica is not recognized to be one of the assigned replicas 15,16 for partition Collect-gnodeb-24. Empty records will be returned for this partition. (kafka.server.ReplicaManager)

 

After controller switched from 14 to 15, it looks like kafka became abnormal. It didn't work even if I restart brokers.

*Logs of Broker 14*

!image-2021-03-19-10-36-04-328.png!

!image-2021-03-19-10-41-44-952.png!

  !image-2021-03-19-10-42-16-296.png!

 

!image-2021-03-19-10-42-32-759.png!

 

*producer LOG*

!image-2021-03-19-10-41-03-203.png!

 

*Consumer got timeout exception:*

*!image-2021-03-19-10-39-24-728.png!*

 


was (Author: alberyzjg):
[~LillianY]

I meet the same issue.

[2021-03-18 17:37:30,799] WARN [ReplicaManager broker=15] Leader 15 failed to record follower 18's position 308399, and last sent HW since the replica is not recognized to be one of the assigned replicas 15,16 for partition Collect-gnodeb-24. Empty records will be returned for this partition. (kafka.server.ReplicaManager)

 

After controller switched from 14 to 15, it looks like kafka became abnormal. It didn't work even if I restart brokers.

!image-2021-03-19-10-36-04-328.png!

!image-2021-03-19-10-37-35-183.png!

 

!image-2021-03-19-10-38-11-280.png!

 

!image-2021-03-19-10-38-22-154.png!

 

*producer LOG*

*!image-2021-03-19-10-38-38-396.png!*

 

*Consumer got timeout exception:*

*!image-2021-03-19-10-39-24-728.png!*

 

> Broker shows WARN on reassignment partitions on new brokers: Replica LEO, follower position & Cache truncation
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8608
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8608
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.1.1
>         Environment: Kafka 2.1.1
>            Reporter: Di Campo
>            Priority: Minor
>              Labels: broker, reassign, repartition
>         Attachments: image-2021-03-19-10-36-04-328.png, image-2021-03-19-10-39-24-728.png, image-2021-03-19-10-41-03-203.png, image-2021-03-19-10-41-44-952.png, image-2021-03-19-10-42-16-296.png, image-2021-03-19-10-42-32-759.png
>
>
> I added two brokers (brokerId 4,5) to a 3-node (brokerId 1,2,3) cluster where there were 32 topics and 64 partitions on each, replication 3.
> Running reassigning partitions. 
> On each run, I can see the following WARN messages, but when the reassignment partition process finishes, it all seems OK. ISR is OK (count is 3 in every partition).
> But I get the following messages types, one per partition:
>  
> {code:java}
> [2019-06-27 12:42:03,946] WARN [LeaderEpochCache visitors-0.0.1-10] New epoch entry EpochEntry(epoch=24, startOffset=51540) caused truncation of conflicting entries ListBuffer(EpochEntry(epoch=22, startOffset=51540)). Cache now contains 5 entries. (kafka.server.epoch.LeaderEpochFileCache) {code}
> -> This relates to cache, so I suppose it's pretty safe.
> {code:java}
> [2019-06-27 12:42:04,250] WARN [ReplicaManager broker=1] Leader 1 failed to record follower 3's position 47981 since the replica is not recognized to be one of the assigned replicas 1,2,5 for partition visitors-0.0.1-28. Empty records will be returned for this partition. (kafka.server.ReplicaManager){code}
> -> This is scary. I'm not sure about the severity of this, but it looks like it may be missing records? 
> {code:java}
> [2019-06-27 12:42:03,709] WARN [ReplicaManager broker=1] While recording the replica LEO, the partition visitors-0.0.1-58 hasn't been created. (kafka.server.ReplicaManager){code}
> -> Here, these partitions are created. 
> First of all - am I supposed to be missing data here? I am assuming I don't, and so far I don't see traces of losing anything.
> If so, I'm not sure what these messages are trying to say here. Should they really be at WARN level? If so - should the message clarify better the different risks involved? 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)