You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Josh Maidana <jo...@gmail.com> on 2017/10/05 08:37:17 UTC

Topics and Partitions

Hello

I am quite new to KAFKA and come from a JMS/messaging background. Reading
through the documentation, I gather using partitions and consumer groups,
KAFKA achieves both P2P and pub/sub. I have a few questions on partitions,
though, I was wondering someone could kindly please point me in the right
directions.

1. In a multi-server scenario, how does KAFKA decide how many partitions of
a given topic is assigned to a given node?
2. When a topic is created dynamically by a consumer or a producer, how is
the number of partitions specified?
3. If it is not or can't be specified, how does KAFKA decide the number of
partitions to create?
4. If a producer doesn't specify a partition, how does KAFKA decide to
which partition the message is allocated.
5. On consumption, do I need to explicitly create multiple consumers to
attain parallelism?
6. If yes, would KAFKA allocate different partition to different consumers
who are part of the same consumer group?
7. If one of those consumers exit, would KAFKA reallocate the partitions to
remaining consumers?
8. How are the offsets propagated from an exited to consumer to the new
consumer to which the partition is reallocated?
9. Is there a listener based API for consumption instead os a blocking poll?

Kind regards
Josh

Re: Topics and Partitions

Posted by Josh Meraj Maidana <jo...@gmail.com>.
Thank you.

I would have expected for the topic to be created either by the producer or
consumer, as it is a bit indeterministic, whether the consumer or producer
would come up first.

Kind regards


On Fri, Oct 6, 2017 at 12:53 PM, Michal Michalski <
michal.michalski@zalando.ie> wrote:

> Hey Josh,
>
> Consumption from non-existent topic will end up with
> "LEADER_NOT_AVAILABLE".
>
> However (!) I just tested it locally (Kafka 0.11) and it seems like
> consuming from a topic that doesn't exist with auto.create.topics.enable
> set to true *will create it* as well (I'm checking it in Zookeeper's
> /brokers/topics path).
>
> I'm a bit surprised this works. Documentation states that:
>
> "You have the option of either adding topics manually *or having them be
> created automatically when data is first published to a non-existent
> topic.*
> "
>
> This (pretty old) email thread confirm that that's intentional:
> http://grokbase.com/t/kafka/users/14a2rgj2h2/auto-topic-
> creation-not-working-for-attempts-to-consume-non-existing-topic
> (Jun Rao: "In general, *only writers should trigger auto topic creation,
> but not the readers*. So, a topic can be auto created by the producer, but
> not the consumer.")
>
> So I'm not sure now if it's a regression or a change made later that's not
> reflected in the docs, but it looks like you *can* currently create topics
> using consumer. I wouldn't rely on this "feature" though - to me,
> personally, it seems wrong and I'm guessing it might be a bug.
>
> Please correct me if I'm wrong / missing something :-)
>
> Michał
>
>
>
> On 6 October 2017 at 04:37, Josh Maidana <jo...@gmail.com> wrote:
>
> > Michal,
> >
> > You mentioned topics are only dynamically created with producers. Does
> that
> > mean if a consumer starts on a non-existent topic, it throws an error?
> >
> > Kind regards
> > Meeraj
> >
> > On Thu, Oct 5, 2017 at 9:20 PM, Josh Maidana <jo...@gmail.com>
> > wrote:
> >
> > > Thank you, Michal.
> > >
> > > That answers all my questions, many thanks.
> > >
> > > Josh
> > >
> > > On Thu, Oct 5, 2017 at 1:21 PM, Michal Michalski <
> > > michal.michalski@zalando.ie> wrote:
> > >
> > >> Hi Josh,
> > >>
> > >> 1. I don't know for sure (haven't seen the code that does it), but
> it's
> > >> probably the most "even" split possible for given number of brokers
> and
> > >> partitions. So for 8 partitions and 3 brokers it would be [3, 3, 2].
> > >> 2. See "num.partitions" in broker config. BTW. only producer can
> create
> > >> topic dynamically, not consumer.
> > >> 3. See 3. The value has to be non-zero, so it's always specified.
> > >> 4. Based on the ProducerRecord (message) key. See:
> > >> https://kafka.apache.org/0110/javadoc/index.html?org/apache/
> > >> kafka/clients/producer/KafkaProducer.html
> > >> 5. Yes - you need to create multiple consumers with the same group.id
> .
> > >> 6. Yes, there'll be at most one consumer (within a consumer group)
> > >> handling
> > >> given partition at a given time.
> > >> 7. Yes, it's a process called "rebalancing" - it reassigns partitions
> to
> > >> consumers when the number of consumers changes.
> > >> 8. Your consumer will commit the last processed offset to special
> Kafka
> > >> topic (or Zookeeper, but that's not a default) every so often
> > >> (periodically
> > >> or "on demand", when you tell it to), so for each partition and
> consumer
> > >> group you know what was and wasn't processed yet. The new consumer
> will
> > >> pick up from the place where the dead one left off.
> > >> 9. If I understand your question correctly - no, Kafka is pull-based
> and
> > >> not push-based by design.
> > >>
> > >> Kind regards,
> > >> Michał
> > >>
> > >> On 5 October 2017 at 09:37, Josh Maidana <jo...@gmail.com>
> wrote:
> > >>
> > >> > Hello
> > >> >
> > >> > I am quite new to KAFKA and come from a JMS/messaging background.
> > >> Reading
> > >> > through the documentation, I gather using partitions and consumer
> > >> groups,
> > >> > KAFKA achieves both P2P and pub/sub. I have a few questions on
> > >> partitions,
> > >> > though, I was wondering someone could kindly please point me in the
> > >> right
> > >> > directions.
> > >> >
> > >> > 1. In a multi-server scenario, how does KAFKA decide how many
> > >> partitions of
> > >> > a given topic is assigned to a given node?
> > >> > 2. When a topic is created dynamically by a consumer or a producer,
> > how
> > >> is
> > >> > the number of partitions specified?
> > >> > 3. If it is not or can't be specified, how does KAFKA decide the
> > number
> > >> of
> > >> > partitions to create?
> > >> > 4. If a producer doesn't specify a partition, how does KAFKA decide
> to
> > >> > which partition the message is allocated.
> > >> > 5. On consumption, do I need to explicitly create multiple consumers
> > to
> > >> > attain parallelism?
> > >> > 6. If yes, would KAFKA allocate different partition to different
> > >> consumers
> > >> > who are part of the same consumer group?
> > >> > 7. If one of those consumers exit, would KAFKA reallocate the
> > >> partitions to
> > >> > remaining consumers?
> > >> > 8. How are the offsets propagated from an exited to consumer to the
> > new
> > >> > consumer to which the partition is reallocated?
> > >> > 9. Is there a listener based API for consumption instead os a
> blocking
> > >> > poll?
> > >> >
> > >> > Kind regards
> > >> > Josh
> > >> >
> > >>
> > >
> > >
> >
>



-- 
Kind regards
*Josh Meraj Maidana*

Re: Topics and Partitions

Posted by Michal Michalski <mi...@zalando.ie>.
Hey Josh,

Consumption from non-existent topic will end up with "LEADER_NOT_AVAILABLE".

However (!) I just tested it locally (Kafka 0.11) and it seems like
consuming from a topic that doesn't exist with auto.create.topics.enable
set to true *will create it* as well (I'm checking it in Zookeeper's
/brokers/topics path).

I'm a bit surprised this works. Documentation states that:

"You have the option of either adding topics manually *or having them be
created automatically when data is first published to a non-existent topic.*
"

This (pretty old) email thread confirm that that's intentional:
http://grokbase.com/t/kafka/users/14a2rgj2h2/auto-topic-creation-not-working-for-attempts-to-consume-non-existing-topic
(Jun Rao: "In general, *only writers should trigger auto topic creation,
but not the readers*. So, a topic can be auto created by the producer, but
not the consumer.")

So I'm not sure now if it's a regression or a change made later that's not
reflected in the docs, but it looks like you *can* currently create topics
using consumer. I wouldn't rely on this "feature" though - to me,
personally, it seems wrong and I'm guessing it might be a bug.

Please correct me if I'm wrong / missing something :-)

Michał



On 6 October 2017 at 04:37, Josh Maidana <jo...@gmail.com> wrote:

> Michal,
>
> You mentioned topics are only dynamically created with producers. Does that
> mean if a consumer starts on a non-existent topic, it throws an error?
>
> Kind regards
> Meeraj
>
> On Thu, Oct 5, 2017 at 9:20 PM, Josh Maidana <jo...@gmail.com>
> wrote:
>
> > Thank you, Michal.
> >
> > That answers all my questions, many thanks.
> >
> > Josh
> >
> > On Thu, Oct 5, 2017 at 1:21 PM, Michal Michalski <
> > michal.michalski@zalando.ie> wrote:
> >
> >> Hi Josh,
> >>
> >> 1. I don't know for sure (haven't seen the code that does it), but it's
> >> probably the most "even" split possible for given number of brokers and
> >> partitions. So for 8 partitions and 3 brokers it would be [3, 3, 2].
> >> 2. See "num.partitions" in broker config. BTW. only producer can create
> >> topic dynamically, not consumer.
> >> 3. See 3. The value has to be non-zero, so it's always specified.
> >> 4. Based on the ProducerRecord (message) key. See:
> >> https://kafka.apache.org/0110/javadoc/index.html?org/apache/
> >> kafka/clients/producer/KafkaProducer.html
> >> 5. Yes - you need to create multiple consumers with the same group.id.
> >> 6. Yes, there'll be at most one consumer (within a consumer group)
> >> handling
> >> given partition at a given time.
> >> 7. Yes, it's a process called "rebalancing" - it reassigns partitions to
> >> consumers when the number of consumers changes.
> >> 8. Your consumer will commit the last processed offset to special Kafka
> >> topic (or Zookeeper, but that's not a default) every so often
> >> (periodically
> >> or "on demand", when you tell it to), so for each partition and consumer
> >> group you know what was and wasn't processed yet. The new consumer will
> >> pick up from the place where the dead one left off.
> >> 9. If I understand your question correctly - no, Kafka is pull-based and
> >> not push-based by design.
> >>
> >> Kind regards,
> >> Michał
> >>
> >> On 5 October 2017 at 09:37, Josh Maidana <jo...@gmail.com> wrote:
> >>
> >> > Hello
> >> >
> >> > I am quite new to KAFKA and come from a JMS/messaging background.
> >> Reading
> >> > through the documentation, I gather using partitions and consumer
> >> groups,
> >> > KAFKA achieves both P2P and pub/sub. I have a few questions on
> >> partitions,
> >> > though, I was wondering someone could kindly please point me in the
> >> right
> >> > directions.
> >> >
> >> > 1. In a multi-server scenario, how does KAFKA decide how many
> >> partitions of
> >> > a given topic is assigned to a given node?
> >> > 2. When a topic is created dynamically by a consumer or a producer,
> how
> >> is
> >> > the number of partitions specified?
> >> > 3. If it is not or can't be specified, how does KAFKA decide the
> number
> >> of
> >> > partitions to create?
> >> > 4. If a producer doesn't specify a partition, how does KAFKA decide to
> >> > which partition the message is allocated.
> >> > 5. On consumption, do I need to explicitly create multiple consumers
> to
> >> > attain parallelism?
> >> > 6. If yes, would KAFKA allocate different partition to different
> >> consumers
> >> > who are part of the same consumer group?
> >> > 7. If one of those consumers exit, would KAFKA reallocate the
> >> partitions to
> >> > remaining consumers?
> >> > 8. How are the offsets propagated from an exited to consumer to the
> new
> >> > consumer to which the partition is reallocated?
> >> > 9. Is there a listener based API for consumption instead os a blocking
> >> > poll?
> >> >
> >> > Kind regards
> >> > Josh
> >> >
> >>
> >
> >
>

Re: Topics and Partitions

Posted by Josh Maidana <jo...@gmail.com>.
Michal,

You mentioned topics are only dynamically created with producers. Does that
mean if a consumer starts on a non-existent topic, it throws an error?

Kind regards
Meeraj

On Thu, Oct 5, 2017 at 9:20 PM, Josh Maidana <jo...@gmail.com> wrote:

> Thank you, Michal.
>
> That answers all my questions, many thanks.
>
> Josh
>
> On Thu, Oct 5, 2017 at 1:21 PM, Michal Michalski <
> michal.michalski@zalando.ie> wrote:
>
>> Hi Josh,
>>
>> 1. I don't know for sure (haven't seen the code that does it), but it's
>> probably the most "even" split possible for given number of brokers and
>> partitions. So for 8 partitions and 3 brokers it would be [3, 3, 2].
>> 2. See "num.partitions" in broker config. BTW. only producer can create
>> topic dynamically, not consumer.
>> 3. See 3. The value has to be non-zero, so it's always specified.
>> 4. Based on the ProducerRecord (message) key. See:
>> https://kafka.apache.org/0110/javadoc/index.html?org/apache/
>> kafka/clients/producer/KafkaProducer.html
>> 5. Yes - you need to create multiple consumers with the same group.id.
>> 6. Yes, there'll be at most one consumer (within a consumer group)
>> handling
>> given partition at a given time.
>> 7. Yes, it's a process called "rebalancing" - it reassigns partitions to
>> consumers when the number of consumers changes.
>> 8. Your consumer will commit the last processed offset to special Kafka
>> topic (or Zookeeper, but that's not a default) every so often
>> (periodically
>> or "on demand", when you tell it to), so for each partition and consumer
>> group you know what was and wasn't processed yet. The new consumer will
>> pick up from the place where the dead one left off.
>> 9. If I understand your question correctly - no, Kafka is pull-based and
>> not push-based by design.
>>
>> Kind regards,
>> Michał
>>
>> On 5 October 2017 at 09:37, Josh Maidana <jo...@gmail.com> wrote:
>>
>> > Hello
>> >
>> > I am quite new to KAFKA and come from a JMS/messaging background.
>> Reading
>> > through the documentation, I gather using partitions and consumer
>> groups,
>> > KAFKA achieves both P2P and pub/sub. I have a few questions on
>> partitions,
>> > though, I was wondering someone could kindly please point me in the
>> right
>> > directions.
>> >
>> > 1. In a multi-server scenario, how does KAFKA decide how many
>> partitions of
>> > a given topic is assigned to a given node?
>> > 2. When a topic is created dynamically by a consumer or a producer, how
>> is
>> > the number of partitions specified?
>> > 3. If it is not or can't be specified, how does KAFKA decide the number
>> of
>> > partitions to create?
>> > 4. If a producer doesn't specify a partition, how does KAFKA decide to
>> > which partition the message is allocated.
>> > 5. On consumption, do I need to explicitly create multiple consumers to
>> > attain parallelism?
>> > 6. If yes, would KAFKA allocate different partition to different
>> consumers
>> > who are part of the same consumer group?
>> > 7. If one of those consumers exit, would KAFKA reallocate the
>> partitions to
>> > remaining consumers?
>> > 8. How are the offsets propagated from an exited to consumer to the new
>> > consumer to which the partition is reallocated?
>> > 9. Is there a listener based API for consumption instead os a blocking
>> > poll?
>> >
>> > Kind regards
>> > Josh
>> >
>>
>
>

Re: Topics and Partitions

Posted by Josh Maidana <jo...@gmail.com>.
Thank you, Michal.

That answers all my questions, many thanks.

Josh

On Thu, Oct 5, 2017 at 1:21 PM, Michal Michalski <
michal.michalski@zalando.ie> wrote:

> Hi Josh,
>
> 1. I don't know for sure (haven't seen the code that does it), but it's
> probably the most "even" split possible for given number of brokers and
> partitions. So for 8 partitions and 3 brokers it would be [3, 3, 2].
> 2. See "num.partitions" in broker config. BTW. only producer can create
> topic dynamically, not consumer.
> 3. See 3. The value has to be non-zero, so it's always specified.
> 4. Based on the ProducerRecord (message) key. See:
> https://kafka.apache.org/0110/javadoc/index.html?org/apache/
> kafka/clients/producer/KafkaProducer.html
> 5. Yes - you need to create multiple consumers with the same group.id.
> 6. Yes, there'll be at most one consumer (within a consumer group) handling
> given partition at a given time.
> 7. Yes, it's a process called "rebalancing" - it reassigns partitions to
> consumers when the number of consumers changes.
> 8. Your consumer will commit the last processed offset to special Kafka
> topic (or Zookeeper, but that's not a default) every so often (periodically
> or "on demand", when you tell it to), so for each partition and consumer
> group you know what was and wasn't processed yet. The new consumer will
> pick up from the place where the dead one left off.
> 9. If I understand your question correctly - no, Kafka is pull-based and
> not push-based by design.
>
> Kind regards,
> Michał
>
> On 5 October 2017 at 09:37, Josh Maidana <jo...@gmail.com> wrote:
>
> > Hello
> >
> > I am quite new to KAFKA and come from a JMS/messaging background. Reading
> > through the documentation, I gather using partitions and consumer groups,
> > KAFKA achieves both P2P and pub/sub. I have a few questions on
> partitions,
> > though, I was wondering someone could kindly please point me in the right
> > directions.
> >
> > 1. In a multi-server scenario, how does KAFKA decide how many partitions
> of
> > a given topic is assigned to a given node?
> > 2. When a topic is created dynamically by a consumer or a producer, how
> is
> > the number of partitions specified?
> > 3. If it is not or can't be specified, how does KAFKA decide the number
> of
> > partitions to create?
> > 4. If a producer doesn't specify a partition, how does KAFKA decide to
> > which partition the message is allocated.
> > 5. On consumption, do I need to explicitly create multiple consumers to
> > attain parallelism?
> > 6. If yes, would KAFKA allocate different partition to different
> consumers
> > who are part of the same consumer group?
> > 7. If one of those consumers exit, would KAFKA reallocate the partitions
> to
> > remaining consumers?
> > 8. How are the offsets propagated from an exited to consumer to the new
> > consumer to which the partition is reallocated?
> > 9. Is there a listener based API for consumption instead os a blocking
> > poll?
> >
> > Kind regards
> > Josh
> >
>

Re: Topics and Partitions

Posted by Michal Michalski <mi...@zalando.ie>.
Hi Josh,

1. I don't know for sure (haven't seen the code that does it), but it's
probably the most "even" split possible for given number of brokers and
partitions. So for 8 partitions and 3 brokers it would be [3, 3, 2].
2. See "num.partitions" in broker config. BTW. only producer can create
topic dynamically, not consumer.
3. See 3. The value has to be non-zero, so it's always specified.
4. Based on the ProducerRecord (message) key. See:
https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html
5. Yes - you need to create multiple consumers with the same group.id.
6. Yes, there'll be at most one consumer (within a consumer group) handling
given partition at a given time.
7. Yes, it's a process called "rebalancing" - it reassigns partitions to
consumers when the number of consumers changes.
8. Your consumer will commit the last processed offset to special Kafka
topic (or Zookeeper, but that's not a default) every so often (periodically
or "on demand", when you tell it to), so for each partition and consumer
group you know what was and wasn't processed yet. The new consumer will
pick up from the place where the dead one left off.
9. If I understand your question correctly - no, Kafka is pull-based and
not push-based by design.

Kind regards,
Michał

On 5 October 2017 at 09:37, Josh Maidana <jo...@gmail.com> wrote:

> Hello
>
> I am quite new to KAFKA and come from a JMS/messaging background. Reading
> through the documentation, I gather using partitions and consumer groups,
> KAFKA achieves both P2P and pub/sub. I have a few questions on partitions,
> though, I was wondering someone could kindly please point me in the right
> directions.
>
> 1. In a multi-server scenario, how does KAFKA decide how many partitions of
> a given topic is assigned to a given node?
> 2. When a topic is created dynamically by a consumer or a producer, how is
> the number of partitions specified?
> 3. If it is not or can't be specified, how does KAFKA decide the number of
> partitions to create?
> 4. If a producer doesn't specify a partition, how does KAFKA decide to
> which partition the message is allocated.
> 5. On consumption, do I need to explicitly create multiple consumers to
> attain parallelism?
> 6. If yes, would KAFKA allocate different partition to different consumers
> who are part of the same consumer group?
> 7. If one of those consumers exit, would KAFKA reallocate the partitions to
> remaining consumers?
> 8. How are the offsets propagated from an exited to consumer to the new
> consumer to which the partition is reallocated?
> 9. Is there a listener based API for consumption instead os a blocking
> poll?
>
> Kind regards
> Josh
>