You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by florent valdelievre <fl...@gmail.com> on 2014/09/25 22:52:32 UTC

Consumers don't get any data if broker leader is down

Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT

Kafka version: kafka_2.8.0-0.8.1.1

I have the following architecture/configuration

staging2.mtl.shopmedia.com (broker.id=1)

zookeeper:9092

kafka:2181

staging3.mtl.shopmedia.com(broker.id=2)

zookeeper:9092

kafka:2181

centos.mtl.shopmedia.com(broker.id=3)

zookeeper:9092

kafka:2181

Each kafka server has the same configuration except broker.idand log.dirs

broker.id=XXX

port=9092

num.network.threads=2

num.io.threads=8

socket.send.buffer.bytes=1048576

socket.receive.buffer.bytes=1048576

socket.request.max.bytes=104857600

log.dirs=/home/shopmedia/nfs/logs/XXX/kafka

num.partitions=1

log.retention.hours=1

log.segment.bytes=536870912

log.retention.check.interval.ms=60000

log.cleaner.enable=false

zookeeper.connect=staging2.mtl.shopmedia.com:2181,
staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181

zookeeper.connection.timeout.ms=1000000

auto.create.topics.enable=true

default.replication.factor=3

Zookeeper configuration is also the same on all servers:

dataDir=/home/shopmedia/apps/zookeeper/data

clientPort=2181

maxClientCnxns=0

I have only 1 topic and 1 partition

I have 3 servers(staging2, staging3 and centos) in case of failover. Each
partition should be replicated among all kafka brokers ( as replica.factor
= 3 )

I have created my topic like this:

kafka-topics.sh --create --zookeeperstaging2.mtl.shopmedia.com:2181,
staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 --topic
hibe-user-server-event --partitions 1 --replication-factor 3

Then I check the topic configuration:

[shopmedia@staging3:~] $kafka-topics.sh --describe --zookeeper
staging2.mtl.shopmedia.com:2181,staging3.mtl.shopmedia.com:2181,
centos.mtl.shopmedia.com:2181 --topic hibe-user-server-event

Topic:hibe-user-server-event    PartitionCount:1        ReplicationFactor:3
    Configs:

  Topic: hibe-user-server-event   Partition: 0    Leader: 2       Replicas:
1,2,3 Isr: 2

According to the describe, my broker leader is 2 (staging3)

QUESTIONS)

1) Why Isr(In Sync Replica) is only 2 and not 1,2,3? This way, if the
leader2 crashes, the other broker won't have any data

2)

I am running a consumers on each machine(staging2, staging3 and centos)
with the following command:

kafka-console-consumer.sh  --zookeeperstaging2.mtl.shopmedia.com:2181,
staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 --topic
hibe-user-server-event

All my servers are up and running(Zoo + kafka)

I start a producer from staging2:

kafka-console-producer.sh --topic hibe-user-server-event --broker-list=
staging2.mtl.shopmedia.com:9092,staging3.mtl.shopmedia.com:9092,
centos.mtl.shopmedia.com:9092

All my consumers receive the message properly.

I shutdown 1 and 3(staging2 and centos)

My consumers still receives the message from the producer( good !)

I restart 1 and 3 ( so all servers are running like before)

I shut 2 only(Leader becomes 1, ISR: 1), My consumers don't receive anymore
message and stdout have the following:

Staging2

[2014-09-25 04:23:57,602] ERROR
[ConsumerFetcherThread-console-consumer-4903_staging2.hibe.com-1411630863195-cbe7a1e8-0-1],
Error for partition [hibe-user-server-event,0] tobroker 1:class
kafka.common.UnknownTopicOrPartitionException
(kafka.consumer.ConsumerFetcherThread)

Staging3

[2014-09-25 04:23:58,459] ERROR
[ConsumerFetcherThread-console-consumer-99699_staging3.hibe.com-1411630877045-98f884fa-0-1],
Error for partition [hibe-user-server-event,0]to broker 1:class
kafka.common.NotLeaderForPartitionException
(kafka.consumer.ConsumerFetcherThread)

Centos

[2014-09-25 04:21:42,393] ERROR
[ConsumerFetcherThread-console-consumer-38882_centos.mtl.shopmedia.com-1411630833934-e6ceffde-0-1],
Error for partition [hibe-user-server-event,0] to broker 1:class
kafka.common.NotLeaderForPartitionException
(kafka.consumer.ConsumerFetcherThread)

Conclusion: When I shut the broker leader, my consumers can't catch up ( I
suspect this is because ISR is not up to date )

Any idea ?

Re: Consumers don't get any data if broker leader is down

Posted by Jun Rao <ju...@gmail.com>.

It could be that broker 1 and 3 can't communicate with broker 2 and the
consumer client. You may want to read
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whycan'tmyconsumers/producersconnecttothebrokers
?

Thanks,

Jun

On Thu, Sep 25, 2014 at 1:52 PM, florent valdelievre <
florentvaldelievre@gmail.com> wrote:

> Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
>
> Kafka version: kafka_2.8.0-0.8.1.1
>
> I have the following architecture/configuration
>
> staging2.mtl.shopmedia.com (broker.id=1)
>
> zookeeper:9092
>
> kafka:2181
>
> staging3.mtl.shopmedia.com(broker.id=2)
>
> zookeeper:9092
>
> kafka:2181
>
> centos.mtl.shopmedia.com(broker.id=3)
>
> zookeeper:9092
>
> kafka:2181
>
> Each kafka server has the same configuration except broker.idand log.dirs
>
> broker.id=XXX
>
> port=9092
>
> num.network.threads=2
>
> num.io.threads=8
>
> socket.send.buffer.bytes=1048576
>
> socket.receive.buffer.bytes=1048576
>
> socket.request.max.bytes=104857600
>
> log.dirs=/home/shopmedia/nfs/logs/XXX/kafka
>
> num.partitions=1
>
> log.retention.hours=1
>
> log.segment.bytes=536870912
>
> log.retention.check.interval.ms=60000
>
> log.cleaner.enable=false
>
> zookeeper.connect=staging2.mtl.shopmedia.com:2181,
> staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181
>
> zookeeper.connection.timeout.ms=1000000
>
> auto.create.topics.enable=true
>
> default.replication.factor=3
>
> Zookeeper configuration is also the same on all servers:
>
> dataDir=/home/shopmedia/apps/zookeeper/data
>
> clientPort=2181
>
> maxClientCnxns=0
>
> I have only 1 topic and 1 partition
>
> I have 3 servers(staging2, staging3 and centos) in case of failover. Each
> partition should be replicated among all kafka brokers ( as replica.factor
> = 3 )
>
> I have created my topic like this:
>
> kafka-topics.sh --create --zookeeperstaging2.mtl.shopmedia.com:2181,
> staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 --topic
> hibe-user-server-event --partitions 1 --replication-factor 3
>
> Then I check the topic configuration:
>
> [shopmedia@staging3:~] $kafka-topics.sh --describe --zookeeper
> staging2.mtl.shopmedia.com:2181,staging3.mtl.shopmedia.com:2181,
> centos.mtl.shopmedia.com:2181 --topic hibe-user-server-event
>
> Topic:hibe-user-server-event    PartitionCount:1        ReplicationFactor:3
>     Configs:
>
>   Topic: hibe-user-server-event   Partition: 0    Leader: 2       Replicas:
> 1,2,3 Isr: 2
>
> According to the describe, my broker leader is 2 (staging3)
>
> QUESTIONS)
>
> 1) Why Isr(In Sync Replica) is only 2 and not 1,2,3? This way, if the
> leader2 crashes, the other broker won't have any data
>
> 2)
>
> I am running a consumers on each machine(staging2, staging3 and centos)
> with the following command:
>
> kafka-console-consumer.sh  --zookeeperstaging2.mtl.shopmedia.com:2181,
> staging3.mtl.shopmedia.com:2181,centos.mtl.shopmedia.com:2181 --topic
> hibe-user-server-event
>
> All my servers are up and running(Zoo + kafka)
>
> I start a producer from staging2:
>
> kafka-console-producer.sh --topic hibe-user-server-event --broker-list=
> staging2.mtl.shopmedia.com:9092,staging3.mtl.shopmedia.com:9092,
> centos.mtl.shopmedia.com:9092
>
> All my consumers receive the message properly.
>
> I shutdown 1 and 3(staging2 and centos)
>
> My consumers still receives the message from the producer( good !)
>
> I restart 1 and 3 ( so all servers are running like before)
>
> I shut 2 only(Leader becomes 1, ISR: 1), My consumers don't receive anymore
> message and stdout have the following:
>
> Staging2
>
> [2014-09-25 04:23:57,602] ERROR
>
> [ConsumerFetcherThread-console-consumer-4903_staging2.hibe.com-1411630863195-cbe7a1e8-0-1],
> Error for partition [hibe-user-server-event,0] tobroker 1:class
> kafka.common.UnknownTopicOrPartitionException
> (kafka.consumer.ConsumerFetcherThread)
>
> Staging3
>
> [2014-09-25 04:23:58,459] ERROR
>
> [ConsumerFetcherThread-console-consumer-99699_staging3.hibe.com-1411630877045-98f884fa-0-1],
> Error for partition [hibe-user-server-event,0]to broker 1:class
> kafka.common.NotLeaderForPartitionException
> (kafka.consumer.ConsumerFetcherThread)
>
> Centos
>
> [2014-09-25 04:21:42,393] ERROR
>
> [ConsumerFetcherThread-console-consumer-38882_centos.mtl.shopmedia.com-1411630833934-e6ceffde-0-1],
> Error for partition [hibe-user-server-event,0] to broker 1:class
> kafka.common.NotLeaderForPartitionException
> (kafka.consumer.ConsumerFetcherThread)
>
> Conclusion: When I shut the broker leader, my consumers can't catch up ( I
> suspect this is because ISR is not up to date )
>
> Any idea ?
>