You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Luciano Afranllie <li...@gmail.com> on 2016/04/26 18:19:26 UTC

Producer and consumer awareness after adding partitions

Hi

I am doing some tests to understand how kafka behaves when adding
partitions to a topic while producing and consuming.

My test is like this

I launch 3 brokers
I create a topic with 3 partitions and replication factor = 2
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --create --topic topic1
--partitions 3 --replication-factor 2
$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1
Topic:topic1 PartitionCount:3 ReplicationFactor:2 Configs:
 Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,3 Isr: 2,3
 Topic: topic1 Partition: 1 Leader: 3 Replicas: 3,1 Isr: 3,1
 Topic: topic1 Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2

Start a single producer and 3 consumers in a single consumer group.
Producer is using default partitioning.
Consumers are using automatic rebalancing and, for test 1
AUTO_OFFSET_RESET_CONFIG = earliest and for test 2 AUTO_OFFSET_RESET_CONFIG
= latest

While producer and consumers are working I modify the topic using
kafka-topics.sh

$ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --alter --topic topic1
-partitions 6; date
WARNING: If partitions are increased for a topic that has a key, the
partition logic or ordering of the messages will be affected
Adding partitions succeeded!
*Tue Apr 26 10:34:46 ART 2016*

Now, what I am observing is that producer does not "see" new partitions and
consumer group does not rebalance until 4 minutes later aprox. Is this the
expected behavior? Is this time configurable? I could not find a property
to change this.

$ date; ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic
topic1
Tue Apr 26 10:37:54 ART 2016
Topic:topic1 PartitionCount:6 ReplicationFactor:2 Configs:
 Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,3 Isr: 2,3
 Topic: topic1 Partition: 1 Leader: 3 Replicas: 3,1 Isr: 3,1
 Topic: topic1 Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
 Topic: topic1 Partition: 3 Leader: 1 Replicas: 1,2 Isr: 1,2
 Topic: topic1 Partition: 4 Leader: 2 Replicas: 2,3 Isr: 2,3
 Topic: topic1 Partition: 5 Leader: 3 Replicas: 3,1 Isr: 3,1
$ date; ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server
kafka-1:9092 --describe --group group1
*Tue Apr 26 10:38:04 ART 2016*
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
group1, topic1, 2, 80, 80, 0, consumer-3_/127.0.0.1
group1, topic1, 0, 91, 91, 0, consumer-1_/127.0.0.1
group1, topic1, 1, 88, 88, 0, consumer-2_/127.0.0.1

$ date; ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server
kafka-1:9092 --describe --group group1
*Tue Apr 26 10:39:40 ART 2016*
GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
group1, topic1, 4, 15, 16, 1, consumer-3_/127.0.0.1
group1, topic1, 5, 9, 9, 0, consumer-3_/127.0.0.1
group1, topic1, 0, 108, 108, 0, consumer-1_/127.0.0.1
group1, topic1, 1, 117, 117, 0, consumer-1_/127.0.0.1
group1, topic1, 2, 99, 99, 0, consumer-2_/127.0.0.1
group1, topic1, 3, 6, 6, 0, consumer-2_/127.0.0.1

Another observation is that when consumers use AUTO_OFFSET_RESET_CONFIG =
latest then some messages are not received by the group. I understand this
is an expected behavior because producer "see" new partitions before
consumer group rebalancing is completed, so producer is writing to some
partitions not yet assigned to the group.
When consumers use AUTO_OFFSET_RESET_CONFIG = earliest all messages are
received (with no duplicated so far in my tests).
If consumers use latest, then they will lose messages, if they use earliest
they can handle rebalances but what happen when they crash and are
restarted? they will get all messages in topic. Is there any recommendation
regarding this topic?

Regards
Luciano

Re: Producer and consumer awareness after adding partitions

Posted by tao xiao <xi...@gmail.com>.
The time is controlled by metadata.max.age.ms

On Wed, Apr 27, 2016 at 1:19 AM Luciano Afranllie <li...@gmail.com>
wrote:

> Hi
>
> I am doing some tests to understand how kafka behaves when adding
> partitions to a topic while producing and consuming.
>
> My test is like this
>
> I launch 3 brokers
> I create a topic with 3 partitions and replication factor = 2
> $ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --create --topic topic1
> --partitions 3 --replication-factor 2
> $ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic topic1
> Topic:topic1 PartitionCount:3 ReplicationFactor:2 Configs:
>  Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,3 Isr: 2,3
>  Topic: topic1 Partition: 1 Leader: 3 Replicas: 3,1 Isr: 3,1
>  Topic: topic1 Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
>
> Start a single producer and 3 consumers in a single consumer group.
> Producer is using default partitioning.
> Consumers are using automatic rebalancing and, for test 1
> AUTO_OFFSET_RESET_CONFIG = earliest and for test 2 AUTO_OFFSET_RESET_CONFIG
> = latest
>
> While producer and consumers are working I modify the topic using
> kafka-topics.sh
>
> $ ./bin/kafka-topics.sh --zookeeper zookeeper-1 --alter --topic topic1
> -partitions 6; date
> WARNING: If partitions are increased for a topic that has a key, the
> partition logic or ordering of the messages will be affected
> Adding partitions succeeded!
> *Tue Apr 26 10:34:46 ART 2016*
>
> Now, what I am observing is that producer does not "see" new partitions and
> consumer group does not rebalance until 4 minutes later aprox. Is this the
> expected behavior? Is this time configurable? I could not find a property
> to change this.
>
> $ date; ./bin/kafka-topics.sh --zookeeper zookeeper-1 --describe --topic
> topic1
> Tue Apr 26 10:37:54 ART 2016
> Topic:topic1 PartitionCount:6 ReplicationFactor:2 Configs:
>  Topic: topic1 Partition: 0 Leader: 2 Replicas: 2,3 Isr: 2,3
>  Topic: topic1 Partition: 1 Leader: 3 Replicas: 3,1 Isr: 3,1
>  Topic: topic1 Partition: 2 Leader: 1 Replicas: 1,2 Isr: 1,2
>  Topic: topic1 Partition: 3 Leader: 1 Replicas: 1,2 Isr: 1,2
>  Topic: topic1 Partition: 4 Leader: 2 Replicas: 2,3 Isr: 2,3
>  Topic: topic1 Partition: 5 Leader: 3 Replicas: 3,1 Isr: 3,1
> $ date; ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server
> kafka-1:9092 --describe --group group1
> *Tue Apr 26 10:38:04 ART 2016*
> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> group1, topic1, 2, 80, 80, 0, consumer-3_/127.0.0.1
> group1, topic1, 0, 91, 91, 0, consumer-1_/127.0.0.1
> group1, topic1, 1, 88, 88, 0, consumer-2_/127.0.0.1
>
> $ date; ./bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server
> kafka-1:9092 --describe --group group1
> *Tue Apr 26 10:39:40 ART 2016*
> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER
> group1, topic1, 4, 15, 16, 1, consumer-3_/127.0.0.1
> group1, topic1, 5, 9, 9, 0, consumer-3_/127.0.0.1
> group1, topic1, 0, 108, 108, 0, consumer-1_/127.0.0.1
> group1, topic1, 1, 117, 117, 0, consumer-1_/127.0.0.1
> group1, topic1, 2, 99, 99, 0, consumer-2_/127.0.0.1
> group1, topic1, 3, 6, 6, 0, consumer-2_/127.0.0.1
>
> Another observation is that when consumers use AUTO_OFFSET_RESET_CONFIG =
> latest then some messages are not received by the group. I understand this
> is an expected behavior because producer "see" new partitions before
> consumer group rebalancing is completed, so producer is writing to some
> partitions not yet assigned to the group.
> When consumers use AUTO_OFFSET_RESET_CONFIG = earliest all messages are
> received (with no duplicated so far in my tests).
> If consumers use latest, then they will lose messages, if they use earliest
> they can handle rebalances but what happen when they crash and are
> restarted? they will get all messages in topic. Is there any recommendation
> regarding this topic?
>
> Regards
> Luciano
>