You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "maverick m." <ma...@hotmail.com> on 2016/04/05 18:30:07 UTC

data not replicated to followers by leader

We saw strange behavior with kafka 0.8.2 brokers today.  

Scenario:
We have 3 kafka brokers in dev and each topic has replication degree 3. We have a topic: X with 10 partitions. There are about 30 topics that we have on the cluster. We saw that just for topic X 1 partition was not replicated atleast for last few weeks. There is no flapping of brokers in ISR. Also, no data was added to topic X. 

Topic:X    PartitionCount:10       ReplicationFactor:3     Configs:max.message.bytes=8000000
        Topic: X   Partition: 0    Leader: 1       Replicas: 1,0,2 Isr: 2,1,0
        Topic: X   Partition: 1    Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: X   Partition: 2    Leader: 2       Replicas: 0,2,1 Isr: 2,1,0
        Topic: X   Partition: 3    Leader: 1       Replicas: 1,2,0 Isr: 2,1,0
        Topic: X   Partition: 4    Leader: 2       Replicas: 2,0,1 Isr: 2
        Topic: X   Partition: 5    Leader: 2       Replicas: 0,1,2 Isr: 2,1,0
        Topic: X   Partition: 6    Leader: 1       Replicas: 1,0,2 Isr: 2,1,0
        Topic: X   Partition: 7    Leader: 2       Replicas: 2,1,0 Isr: 2,1,0
        Topic: X   Partition: 8    Leader: 2       Replicas: 0,2,1 Isr: 2,1,0
        Topic: X   Partition: 9    Leader: 1       Replicas: 1,2,0 Isr: 2,1,0

Note: Partition 4 has ISR: 2 for alteast 7 days. Not much data has been added into topic X as it is retry topic.

We bounced all the brokers and the underreplication issue was resolved but I think we have data loss (more details below).  

Questions
1) Why is data not replicated from leaders to followers ? I can understand if the data volume is high but data for this topic is not much. Few thousand messages per day.
2) When we restarted all the brokers we saw that leader became follower and rolled back the offset to older offset when it became follower. I didn't understand how can data loss happen. If broker 2 dies shouldn't ISR: list be empty and no leader should be selected for that partition ? 

[2016-04-05 13:59:24,059] INFO Partition [X,4] on broker 0: Expanding ISR for partition [X,4] from 0 to 0,1 (kafka.cluster.Partition)
[2016-04-05 13:59:24,244] INFO [ReplicaFetcherThread-8-2], Stopped  (kafka.server.ReplicaFetcherThread)
[2016-04-05 14:00:14,279] ERROR [Replica Manager on Broker 0]: Error when processing fetch request for partition [X,4] offset 187185 from follower with correlation id 0. Possible cause: Request for offset 187185 but we only have log segments in the range 166211 to 166211. (kafka.server.ReplicaManager)
[2016-04-05 14:00:14,352] INFO Partition [X,4] on broker 0: Expanding ISR for partition [X,4] from 0,1 to 0,1,2 (kafka.cluster.Partition)

When broker 2 was restarted. It seems following things happen from the log
1) Broker 0 become the leader with offset 166211 
2) Broker 1  joined as follower with offset 166211 
3) Broker 2 joined as follower with offset 187185  but was then reset to 166211. 

Thanks in advance for any insight.