You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Youssef BOUZAIENNE (Jira)" <ji...@apache.org> on 2020/06/09 14:00:00 UTC

[jira] [Updated] (KAFKA-10127) kafka cluster not recovering - Shrinking ISR continously

     [ https://issues.apache.org/jira/browse/KAFKA-10127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Youssef BOUZAIENNE updated KAFKA-10127:
---------------------------------------
    Description: 
We are actually facing issue from time to time where our kafka cluster goes into a weird state where we see the following log repeating

[2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Cached zkVersion 620 not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
 [2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Shrinking ISR from 1006,1002 to 1002. Leader: (highWatermark: 3222733572, endOffset: 3222741893). Out of sync replicas: (brokerId: 1006, endOffset: 3222733572). (kafka.cluster.Partition)

 

Before that our zookeeper session expired which lead us to that state 

 

after we increased this two value we encounter the issue less frequently but it still appears from time to time and the only solution is restart of kafka service on all brokers

zookeeper.session.timeout.ms=18000

replica.lag.time.max.ms=30000

 

Any help on that please  

  was:
We are actually facing issue from time to time where our kafka cluster goes into a weird state where we see the following log repeating

 

[2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Cached zkVersion 620 not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Shrinking ISR from 1006,1002 to 1002. Leader: (highWatermark: 3222733572, endOffset: 3222741893). Out of sync replicas: (brokerId: 1006, endOffset: 3222733572). (kafka.cluster.Partition)

 

 

Before that our zookeeper session expired which lead us to that state 

 

after we increased this two value we encounter the issue less frequently but it still appears from time to time and the only solution is restart of kafka service on all brokers

zookeeper.session.timeout.ms=18000

replica.lag.time.max.ms=30000

 

Any help on that please

 

  


> kafka cluster not recovering - Shrinking ISR  continously
> ---------------------------------------------------------
>
>                 Key: KAFKA-10127
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10127
>             Project: Kafka
>          Issue Type: Bug
>          Components: replication, zkclient
>    Affects Versions: 2.4.1
>         Environment: using kafka version 2.4.1 and zookeeper version 3.5.7
>            Reporter: Youssef BOUZAIENNE
>            Priority: Major
>
> We are actually facing issue from time to time where our kafka cluster goes into a weird state where we see the following log repeating
> [2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Cached zkVersion 620 not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
>  [2020-06-06 08:35:48,117] INFO [Partition test broker=1002] Shrinking ISR from 1006,1002 to 1002. Leader: (highWatermark: 3222733572, endOffset: 3222741893). Out of sync replicas: (brokerId: 1006, endOffset: 3222733572). (kafka.cluster.Partition)
>  
> Before that our zookeeper session expired which lead us to that state 
>  
> after we increased this two value we encounter the issue less frequently but it still appears from time to time and the only solution is restart of kafka service on all brokers
> zookeeper.session.timeout.ms=18000
> replica.lag.time.max.ms=30000
>  
> Any help on that please  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)