You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Neha Narkhede (JIRA)" <ji...@apache.org> on 2015/02/07 21:02:34 UTC
[jira] [Commented] (KAFKA-1908) Split brain
[ https://issues.apache.org/jira/browse/KAFKA-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310901#comment-14310901 ]
Neha Narkhede commented on KAFKA-1908:
--------------------------------------
Thanks for sharing this test case!
bq. A consumer can read data from replica-1 or replica-2. When it reads from replica-1 it resets the offsets and than can read duplicates from replica-2.
When the consumer wants to consume, it first issues a metadata request asking one of the brokers who the leader for partition 0 is. In your test, only brokers 2 and 3 can serve that metadata request and will end up telling the consumer to consume from broker 2 since it is the new leader. I'm not sure I understood how the consumer ends up consuming from broker 1 when its port is disabled?
> Split brain
> -----------
>
> Key: KAFKA-1908
> URL: https://issues.apache.org/jira/browse/KAFKA-1908
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.2
> Reporter: Alexey Ozeritskiy
>
> In some cases, there may be two leaders for one partition.
> Steps to reproduce:
> # We have 3 brokers, 1 partition with 3 replicas:
> {code}
> TopicAndPartition: [partition,0] Leader: 1 Replicas: [2,1,3] ISR: [1,2,3]
> {code}
> # controller works on broker 3
> # let the kafka port be 9092. We execute on broker 1:
> {code}
> iptables -A INPUT -p tcp --dport 9092 -j REJECT
> {code}
> # Initiate replica election
> # As a result:
> Broker 1:
> {code}
> TopicAndPartition: [partition,0] Leader: 1 Replicas: [2,1,3] ISR: [1,2,3]
> {code}
> Broker 2:
> {code}
> TopicAndPartition: [partition,0] Leader: 2 Replicas: [2,1,3] ISR: [1,2,3]
> {code}
> # Flush the iptables rules on broker 1
> Now we can produce messages to {code}[partition,0]{code}. Replica-1 will not receive new data. A consumer can read data from replica-1 or replica-2. When it reads from replica-1 it resets the offsets and than can read duplicates from replica-2.
> We saw this situation in our production cluster when it had network problems.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)