You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Rodenburg, Jeff" <je...@teamaol.com> on 2012/06/12 18:58:15 UTC

Consumer group concept

Hi all -

Just getting familiar with Kafka, and learning about consumer groups. Hoping someone can provide some context here.

As I understand it, consumers register with the broker and consume a topic. Multiple consumers can consume a single topic, as a consumer group. Each consumer actually gets a partition of messages, so there is no overlap -- a single consumer within a group will receive a message on its topic/partition.  Consumer rebalancing is the process whereby members of a consumer group are added and/or dropped from the group, and partitions are sorted/reassigned to the current consumer group members.

Some questions:

  *   Is this accurate? What am I missing?
  *   Operationally, is consumer "failover" basically service monitoring at the consumer process level?
  *   How much coordination is required between producers and consumers around partitioning? (Automated, configuration, etc.)
  *   How are topics monitored for SLA on throughput/load, i.e. spinning up consumers as needed for topic message spikes?

Appreciate any further information and/or context anyone can share.

cheers,
Jeff

Re: Consumer group concept

Posted by "Rodenburg, Jeff" <je...@teamaol.com>.
Yeah, been there done that.  Design documents can sometimes be challenging to expressing certain key concepts.

The intersection of consumers, groups and partitions is really central to understanding how Kafka scales (which is driving our interest.)  The partition information is discussed in multiple sections of the document, but isn't given first-class treatment (like consumer state, for example.)  Understanding how the producer can drive partitioning logic, to storage, to consumer retrieval is all in the document -- it's just not front-and-center. A graphical message flow diagram would help explain the concept immensely.

But really, that's just interpretation on my end -- the documentation is really good on the design end. My standard disclaimer for others is "your mileage may vary."

Thanks for the education, it's been super helpful.

-jeff





On Jun 12, 2012, at 3:46 PM, Jay Kreps wrote:

> Hey Jeff,
> 
> One goal we have is to make the docs understandable to people just trying
> to learn the system. It sounds like it didn't quite work in this case, and
> you aren't alone--somehow consumer groups confuse everyone. Of course they
> seem very intuitive to us after some years of working on this project! If
> you are able to see areas that are not explained clearly, or could use an
> example, or that aren't covered could you send us that feedback? It really
> helps to have someone who is coming at this with a fresh eye help flesh out
> areas that can be improved.
> 
> -Jay
> 
> On Tue, Jun 12, 2012 at 3:19 PM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
>> wrote:
> 
>> Thanks Joel, this makes sense now. I've been walking through the design
>> document most of the day and was trying to reconcile the difference between
>> consumer streams and partitions (not sure why I tied the two together
>> previously.)
>> 
>> Pretty sure I'm following this now, thank you for the comments.
>> 
>> -j
>> 
>> 
>> 
>> On Jun 12, 2012, at 2:24 PM, Joel Koshy wrote:
>> 
>>> Hi Jeff,
>>> 
>>> Load balancing is done by range partitioning the available partitions for
>>> the topic across the consumer processes (streams). The algorithm is given
>>> at the very end of the design document:
>>> http://incubator.apache.org/kafka/design.html - but here's a quick
>> example.
>>> If you have four nodes, and two message streams per node (i.e., each
>> node's
>>> consumer config is "foo":2) this means there are eight consumer streams
>> in
>>> total. The available partitions for "foo" are allocated to these eight
>>> streams using the rebalancing algorithm. For e.g,. if there are eight
>>> available partitions on the brokers then each consumer stream with get
>> one
>>> partition. If there are fewer than eight, some of the consumer streams
>> will
>>> not get any data. If there are more than eight, then some streams will
>> get
>>> more than one partition (if # partitions % # streams == 0 then it will be
>>> evenly spread, and skewed otherwise).
>>> 
>>> Thanks,
>>> 
>>> Joel
>>> 
>>> On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <
>> jeff.rodenburg@teamaol.com
>>>> wrote:
>>> 
>>>> Great, I'm running the quick start and can see that in operation.
>>>> 
>>>> Ok, last question on this thread:
>>>> 
>>>>> So if you have two consumer groups consuming a topic, and each consumer
>>>> group has 4 machines in it, then a message published to this topic
>> would be
>>>> delivered to one machine in each of the two groups.
>>>> 
>>>> How is topic load-balancing for consumers handled?  For example, if a
>>>> consumer group has 4 machines in it (consumer per machine), in reality
>> only
>>>> one machine in the group is actually working.  If I want multiple
>> machines
>>>> handling items in a topic, how is that approach handled? I could see
>>>> producers generating more topics, and consumers subscribing to those
>>>> (making a high-volume topic more granular).  What's best practice when
>>>> consumer tasks on topic messages need to be handled by multiple
>> consumers?
>>>> 
>>>> -Jeff
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
>>>> 
>>>>> Basically the rule is this "every message sent to the topic is
>> delivered
>>>> to
>>>>> one machine/process in each consumer group". So if you have two
>> consumer
>>>>> groups consuming a topic, and each consumer group has 4 machines in it,
>>>>> then a message published to this topic would be delivered to one
>> machine
>>>> in
>>>>> each of the two groups.
>>>>> 
>>>>> -Jay
>>>>> 
>>>>> On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
>>>>> jeff.rodenburg@teamaol.com> wrote:
>>>>> 
>>>>>> Thanks for the info, Jun.
>>>>>> 
>>>>>>> if you just want each message to be consumed by a consumer, not a
>>>>>> particular one
>>>>>> 
>>>>>> What is intended to be a particular consumer? Something on the order
>> of
>>>>>> Consumer #3 within a group needs message #123?
>>>>>> 
>>>>>> Ok, next question:
>>>>>> 
>>>>>> What is the relationship between topics and consumer groups? More to
>> the
>>>>>> point, can I have multiple consumer groups that all consume the same
>>>> topic?
>>>>>> For example, assume a set of producers are publishing to the topic
>>>> "ABC".
>>>>>> Suppose I have multiple processes that take action on a given ABC
>>>> message
>>>>>> -- process 1 handles billing, process 2 handles file management,
>>>> process 3
>>>>>> handles history/archiving, etc.  Can I structure multiple groups that
>>>>>> consume the same topic? How does partitioning work at that point?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
>>>>>> 
>>>>>>> Jeff,
>>>>>>> 
>>>>>>> Your understanding is correct. Operational wise, we have some jmx
>> that
>>>>>>> gives consumer stats per topic. There is also a tool CheckOffsetLag
>>>> that
>>>>>>> tells you how far behind a consumer is. For coordination btw
>> producers
>>>>>> and
>>>>>>> consumers, if you just want each message to be consumed by a
>> consumer,
>>>>>> not
>>>>>>> a particular one, there is no coordination needed.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Jun
>>>>>>> 
>>>>>>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
>>>>>> jeff.rodenburg@teamaol.com
>>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all -
>>>>>>>> 
>>>>>>>> Just getting familiar with Kafka, and learning about consumer
>> groups.
>>>>>>>> Hoping someone can provide some context here.
>>>>>>>> 
>>>>>>>> As I understand it, consumers register with the broker and consume a
>>>>>>>> topic. Multiple consumers can consume a single topic, as a consumer
>>>>>> group.
>>>>>>>> Each consumer actually gets a partition of messages, so there is no
>>>>>> overlap
>>>>>>>> -- a single consumer within a group will receive a message on its
>>>>>>>> topic/partition.  Consumer rebalancing is the process whereby
>> members
>>>>>> of a
>>>>>>>> consumer group are added and/or dropped from the group, and
>> partitions
>>>>>> are
>>>>>>>> sorted/reassigned to the current consumer group members.
>>>>>>>> 
>>>>>>>> Some questions:
>>>>>>>> 
>>>>>>>> *   Is this accurate? What am I missing?
>>>>>>>> *   Operationally, is consumer "failover" basically service
>> monitoring
>>>>>> at
>>>>>>>> the consumer process level?
>>>>>>>> *   How much coordination is required between producers and
>> consumers
>>>>>>>> around partitioning? (Automated, configuration, etc.)
>>>>>>>> *   How are topics monitored for SLA on throughput/load, i.e.
>> spinning
>>>>>> up
>>>>>>>> consumers as needed for topic message spikes?
>>>>>>>> 
>>>>>>>> Appreciate any further information and/or context anyone can share.
>>>>>>>> 
>>>>>>>> cheers,
>>>>>>>> Jeff
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Consumer group concept

Posted by Jay Kreps <ja...@gmail.com>.
Hey Jeff,

One goal we have is to make the docs understandable to people just trying
to learn the system. It sounds like it didn't quite work in this case, and
you aren't alone--somehow consumer groups confuse everyone. Of course they
seem very intuitive to us after some years of working on this project! If
you are able to see areas that are not explained clearly, or could use an
example, or that aren't covered could you send us that feedback? It really
helps to have someone who is coming at this with a fresh eye help flesh out
areas that can be improved.

-Jay

On Tue, Jun 12, 2012 at 3:19 PM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
> wrote:

> Thanks Joel, this makes sense now. I've been walking through the design
> document most of the day and was trying to reconcile the difference between
> consumer streams and partitions (not sure why I tied the two together
> previously.)
>
> Pretty sure I'm following this now, thank you for the comments.
>
> -j
>
>
>
> On Jun 12, 2012, at 2:24 PM, Joel Koshy wrote:
>
> > Hi Jeff,
> >
> > Load balancing is done by range partitioning the available partitions for
> > the topic across the consumer processes (streams). The algorithm is given
> > at the very end of the design document:
> > http://incubator.apache.org/kafka/design.html - but here's a quick
> example.
> > If you have four nodes, and two message streams per node (i.e., each
> node's
> > consumer config is "foo":2) this means there are eight consumer streams
> in
> > total. The available partitions for "foo" are allocated to these eight
> > streams using the rebalancing algorithm. For e.g,. if there are eight
> > available partitions on the brokers then each consumer stream with get
> one
> > partition. If there are fewer than eight, some of the consumer streams
> will
> > not get any data. If there are more than eight, then some streams will
> get
> > more than one partition (if # partitions % # streams == 0 then it will be
> > evenly spread, and skewed otherwise).
> >
> > Thanks,
> >
> > Joel
> >
> > On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <
> jeff.rodenburg@teamaol.com
> >> wrote:
> >
> >> Great, I'm running the quick start and can see that in operation.
> >>
> >> Ok, last question on this thread:
> >>
> >>> So if you have two consumer groups consuming a topic, and each consumer
> >> group has 4 machines in it, then a message published to this topic
> would be
> >> delivered to one machine in each of the two groups.
> >>
> >> How is topic load-balancing for consumers handled?  For example, if a
> >> consumer group has 4 machines in it (consumer per machine), in reality
> only
> >> one machine in the group is actually working.  If I want multiple
> machines
> >> handling items in a topic, how is that approach handled? I could see
> >> producers generating more topics, and consumers subscribing to those
> >> (making a high-volume topic more granular).  What's best practice when
> >> consumer tasks on topic messages need to be handled by multiple
> consumers?
> >>
> >> -Jeff
> >>
> >>
> >>
> >>
> >>
> >> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
> >>
> >>> Basically the rule is this "every message sent to the topic is
> delivered
> >> to
> >>> one machine/process in each consumer group". So if you have two
> consumer
> >>> groups consuming a topic, and each consumer group has 4 machines in it,
> >>> then a message published to this topic would be delivered to one
> machine
> >> in
> >>> each of the two groups.
> >>>
> >>> -Jay
> >>>
> >>> On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
> >>> jeff.rodenburg@teamaol.com> wrote:
> >>>
> >>>> Thanks for the info, Jun.
> >>>>
> >>>>> if you just want each message to be consumed by a consumer, not a
> >>>> particular one
> >>>>
> >>>> What is intended to be a particular consumer? Something on the order
> of
> >>>> Consumer #3 within a group needs message #123?
> >>>>
> >>>> Ok, next question:
> >>>>
> >>>> What is the relationship between topics and consumer groups? More to
> the
> >>>> point, can I have multiple consumer groups that all consume the same
> >> topic?
> >>>> For example, assume a set of producers are publishing to the topic
> >> "ABC".
> >>>> Suppose I have multiple processes that take action on a given ABC
> >> message
> >>>> -- process 1 handles billing, process 2 handles file management,
> >> process 3
> >>>> handles history/archiving, etc.  Can I structure multiple groups that
> >>>> consume the same topic? How does partitioning work at that point?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
> >>>>
> >>>>> Jeff,
> >>>>>
> >>>>> Your understanding is correct. Operational wise, we have some jmx
> that
> >>>>> gives consumer stats per topic. There is also a tool CheckOffsetLag
> >> that
> >>>>> tells you how far behind a consumer is. For coordination btw
> producers
> >>>> and
> >>>>> consumers, if you just want each message to be consumed by a
> consumer,
> >>>> not
> >>>>> a particular one, there is no coordination needed.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Jun
> >>>>>
> >>>>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
> >>>> jeff.rodenburg@teamaol.com
> >>>>>> wrote:
> >>>>>
> >>>>>> Hi all -
> >>>>>>
> >>>>>> Just getting familiar with Kafka, and learning about consumer
> groups.
> >>>>>> Hoping someone can provide some context here.
> >>>>>>
> >>>>>> As I understand it, consumers register with the broker and consume a
> >>>>>> topic. Multiple consumers can consume a single topic, as a consumer
> >>>> group.
> >>>>>> Each consumer actually gets a partition of messages, so there is no
> >>>> overlap
> >>>>>> -- a single consumer within a group will receive a message on its
> >>>>>> topic/partition.  Consumer rebalancing is the process whereby
> members
> >>>> of a
> >>>>>> consumer group are added and/or dropped from the group, and
> partitions
> >>>> are
> >>>>>> sorted/reassigned to the current consumer group members.
> >>>>>>
> >>>>>> Some questions:
> >>>>>>
> >>>>>> *   Is this accurate? What am I missing?
> >>>>>> *   Operationally, is consumer "failover" basically service
> monitoring
> >>>> at
> >>>>>> the consumer process level?
> >>>>>> *   How much coordination is required between producers and
> consumers
> >>>>>> around partitioning? (Automated, configuration, etc.)
> >>>>>> *   How are topics monitored for SLA on throughput/load, i.e.
> spinning
> >>>> up
> >>>>>> consumers as needed for topic message spikes?
> >>>>>>
> >>>>>> Appreciate any further information and/or context anyone can share.
> >>>>>>
> >>>>>> cheers,
> >>>>>> Jeff
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Consumer group concept

Posted by "Rodenburg, Jeff" <je...@teamaol.com>.
Thanks Joel, this makes sense now. I've been walking through the design document most of the day and was trying to reconcile the difference between consumer streams and partitions (not sure why I tied the two together previously.)

Pretty sure I'm following this now, thank you for the comments.

-j



On Jun 12, 2012, at 2:24 PM, Joel Koshy wrote:

> Hi Jeff,
> 
> Load balancing is done by range partitioning the available partitions for
> the topic across the consumer processes (streams). The algorithm is given
> at the very end of the design document:
> http://incubator.apache.org/kafka/design.html - but here's a quick example.
> If you have four nodes, and two message streams per node (i.e., each node's
> consumer config is "foo":2) this means there are eight consumer streams in
> total. The available partitions for "foo" are allocated to these eight
> streams using the rebalancing algorithm. For e.g,. if there are eight
> available partitions on the brokers then each consumer stream with get one
> partition. If there are fewer than eight, some of the consumer streams will
> not get any data. If there are more than eight, then some streams will get
> more than one partition (if # partitions % # streams == 0 then it will be
> evenly spread, and skewed otherwise).
> 
> Thanks,
> 
> Joel
> 
> On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
>> wrote:
> 
>> Great, I'm running the quick start and can see that in operation.
>> 
>> Ok, last question on this thread:
>> 
>>> So if you have two consumer groups consuming a topic, and each consumer
>> group has 4 machines in it, then a message published to this topic would be
>> delivered to one machine in each of the two groups.
>> 
>> How is topic load-balancing for consumers handled?  For example, if a
>> consumer group has 4 machines in it (consumer per machine), in reality only
>> one machine in the group is actually working.  If I want multiple machines
>> handling items in a topic, how is that approach handled? I could see
>> producers generating more topics, and consumers subscribing to those
>> (making a high-volume topic more granular).  What's best practice when
>> consumer tasks on topic messages need to be handled by multiple consumers?
>> 
>> -Jeff
>> 
>> 
>> 
>> 
>> 
>> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
>> 
>>> Basically the rule is this "every message sent to the topic is delivered
>> to
>>> one machine/process in each consumer group". So if you have two consumer
>>> groups consuming a topic, and each consumer group has 4 machines in it,
>>> then a message published to this topic would be delivered to one machine
>> in
>>> each of the two groups.
>>> 
>>> -Jay
>>> 
>>> On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
>>> jeff.rodenburg@teamaol.com> wrote:
>>> 
>>>> Thanks for the info, Jun.
>>>> 
>>>>> if you just want each message to be consumed by a consumer, not a
>>>> particular one
>>>> 
>>>> What is intended to be a particular consumer? Something on the order of
>>>> Consumer #3 within a group needs message #123?
>>>> 
>>>> Ok, next question:
>>>> 
>>>> What is the relationship between topics and consumer groups? More to the
>>>> point, can I have multiple consumer groups that all consume the same
>> topic?
>>>> For example, assume a set of producers are publishing to the topic
>> "ABC".
>>>> Suppose I have multiple processes that take action on a given ABC
>> message
>>>> -- process 1 handles billing, process 2 handles file management,
>> process 3
>>>> handles history/archiving, etc.  Can I structure multiple groups that
>>>> consume the same topic? How does partitioning work at that point?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
>>>> 
>>>>> Jeff,
>>>>> 
>>>>> Your understanding is correct. Operational wise, we have some jmx that
>>>>> gives consumer stats per topic. There is also a tool CheckOffsetLag
>> that
>>>>> tells you how far behind a consumer is. For coordination btw producers
>>>> and
>>>>> consumers, if you just want each message to be consumed by a consumer,
>>>> not
>>>>> a particular one, there is no coordination needed.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
>>>> jeff.rodenburg@teamaol.com
>>>>>> wrote:
>>>>> 
>>>>>> Hi all -
>>>>>> 
>>>>>> Just getting familiar with Kafka, and learning about consumer groups.
>>>>>> Hoping someone can provide some context here.
>>>>>> 
>>>>>> As I understand it, consumers register with the broker and consume a
>>>>>> topic. Multiple consumers can consume a single topic, as a consumer
>>>> group.
>>>>>> Each consumer actually gets a partition of messages, so there is no
>>>> overlap
>>>>>> -- a single consumer within a group will receive a message on its
>>>>>> topic/partition.  Consumer rebalancing is the process whereby members
>>>> of a
>>>>>> consumer group are added and/or dropped from the group, and partitions
>>>> are
>>>>>> sorted/reassigned to the current consumer group members.
>>>>>> 
>>>>>> Some questions:
>>>>>> 
>>>>>> *   Is this accurate? What am I missing?
>>>>>> *   Operationally, is consumer "failover" basically service monitoring
>>>> at
>>>>>> the consumer process level?
>>>>>> *   How much coordination is required between producers and consumers
>>>>>> around partitioning? (Automated, configuration, etc.)
>>>>>> *   How are topics monitored for SLA on throughput/load, i.e. spinning
>>>> up
>>>>>> consumers as needed for topic message spikes?
>>>>>> 
>>>>>> Appreciate any further information and/or context anyone can share.
>>>>>> 
>>>>>> cheers,
>>>>>> Jeff
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Consumer group concept

Posted by Joel Koshy <jj...@gmail.com>.
Hi Jeff,

Load balancing is done by range partitioning the available partitions for
the topic across the consumer processes (streams). The algorithm is given
at the very end of the design document:
http://incubator.apache.org/kafka/design.html - but here's a quick example.
If you have four nodes, and two message streams per node (i.e., each node's
consumer config is "foo":2) this means there are eight consumer streams in
total. The available partitions for "foo" are allocated to these eight
streams using the rebalancing algorithm. For e.g,. if there are eight
available partitions on the brokers then each consumer stream with get one
partition. If there are fewer than eight, some of the consumer streams will
not get any data. If there are more than eight, then some streams will get
more than one partition (if # partitions % # streams == 0 then it will be
evenly spread, and skewed otherwise).

Thanks,

Joel

On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
> wrote:

> Great, I'm running the quick start and can see that in operation.
>
> Ok, last question on this thread:
>
> > So if you have two consumer groups consuming a topic, and each consumer
> group has 4 machines in it, then a message published to this topic would be
> delivered to one machine in each of the two groups.
>
> How is topic load-balancing for consumers handled?  For example, if a
> consumer group has 4 machines in it (consumer per machine), in reality only
> one machine in the group is actually working.  If I want multiple machines
> handling items in a topic, how is that approach handled? I could see
> producers generating more topics, and consumers subscribing to those
> (making a high-volume topic more granular).  What's best practice when
> consumer tasks on topic messages need to be handled by multiple consumers?
>
> -Jeff
>
>
>
>
>
> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
>
> > Basically the rule is this "every message sent to the topic is delivered
> to
> > one machine/process in each consumer group". So if you have two consumer
> > groups consuming a topic, and each consumer group has 4 machines in it,
> > then a message published to this topic would be delivered to one machine
> in
> > each of the two groups.
> >
> > -Jay
> >
> > On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
> > jeff.rodenburg@teamaol.com> wrote:
> >
> >> Thanks for the info, Jun.
> >>
> >>> if you just want each message to be consumed by a consumer, not a
> >> particular one
> >>
> >> What is intended to be a particular consumer? Something on the order of
> >> Consumer #3 within a group needs message #123?
> >>
> >> Ok, next question:
> >>
> >> What is the relationship between topics and consumer groups? More to the
> >> point, can I have multiple consumer groups that all consume the same
> topic?
> >> For example, assume a set of producers are publishing to the topic
> "ABC".
> >> Suppose I have multiple processes that take action on a given ABC
> message
> >> -- process 1 handles billing, process 2 handles file management,
> process 3
> >> handles history/archiving, etc.  Can I structure multiple groups that
> >> consume the same topic? How does partitioning work at that point?
> >>
> >>
> >>
> >>
> >> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
> >>
> >>> Jeff,
> >>>
> >>> Your understanding is correct. Operational wise, we have some jmx that
> >>> gives consumer stats per topic. There is also a tool CheckOffsetLag
> that
> >>> tells you how far behind a consumer is. For coordination btw producers
> >> and
> >>> consumers, if you just want each message to be consumed by a consumer,
> >> not
> >>> a particular one, there is no coordination needed.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
> >> jeff.rodenburg@teamaol.com
> >>>> wrote:
> >>>
> >>>> Hi all -
> >>>>
> >>>> Just getting familiar with Kafka, and learning about consumer groups.
> >>>> Hoping someone can provide some context here.
> >>>>
> >>>> As I understand it, consumers register with the broker and consume a
> >>>> topic. Multiple consumers can consume a single topic, as a consumer
> >> group.
> >>>> Each consumer actually gets a partition of messages, so there is no
> >> overlap
> >>>> -- a single consumer within a group will receive a message on its
> >>>> topic/partition.  Consumer rebalancing is the process whereby members
> >> of a
> >>>> consumer group are added and/or dropped from the group, and partitions
> >> are
> >>>> sorted/reassigned to the current consumer group members.
> >>>>
> >>>> Some questions:
> >>>>
> >>>> *   Is this accurate? What am I missing?
> >>>> *   Operationally, is consumer "failover" basically service monitoring
> >> at
> >>>> the consumer process level?
> >>>> *   How much coordination is required between producers and consumers
> >>>> around partitioning? (Automated, configuration, etc.)
> >>>> *   How are topics monitored for SLA on throughput/load, i.e. spinning
> >> up
> >>>> consumers as needed for topic message spikes?
> >>>>
> >>>> Appreciate any further information and/or context anyone can share.
> >>>>
> >>>> cheers,
> >>>> Jeff
> >>>>
> >>
> >>
>
>

Re: Consumer group concept

Posted by "Rodenburg, Jeff" <je...@teamaol.com>.
Thanks Jay, think I was mismatching the concept of partition to (consumer) stream. This makes much more sense now.

-j




On Jun 12, 2012, at 2:34 PM, Jay Kreps wrote:

> I think a lot of these details are in the design doc, you may find that
> helpful (http://incubator.apache.org/kafka/design.html).
> 
> To answer your question, it isn't the case that only one machine is
> consuming. All machines in the group will consume. The way it works is that
> each broker has some number of partitions. These partitions are divided up
> over the consumer machines. The data in the partition is delivered in order
> to whichever consumer is currently consuming that partition. Zookeeper is
> used to balance the mapping of consumers to partitions. One consumer can
> have many partitions, but if you have more consumers than partitions some
> will not have any work to do.
> 
> -Jay
> 
> On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
>> wrote:
> 
>> Great, I'm running the quick start and can see that in operation.
>> 
>> Ok, last question on this thread:
>> 
>>> So if you have two consumer groups consuming a topic, and each consumer
>> group has 4 machines in it, then a message published to this topic would be
>> delivered to one machine in each of the two groups.
>> 
>> How is topic load-balancing for consumers handled?  For example, if a
>> consumer group has 4 machines in it (consumer per machine), in reality only
>> one machine in the group is actually working.  If I want multiple machines
>> handling items in a topic, how is that approach handled? I could see
>> producers generating more topics, and consumers subscribing to those
>> (making a high-volume topic more granular).  What's best practice when
>> consumer tasks on topic messages need to be handled by multiple consumers?
>> 
>> -Jeff
>> 
>> 
>> 
>> 
>> 
>> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
>> 
>>> Basically the rule is this "every message sent to the topic is delivered
>> to
>>> one machine/process in each consumer group". So if you have two consumer
>>> groups consuming a topic, and each consumer group has 4 machines in it,
>>> then a message published to this topic would be delivered to one machine
>> in
>>> each of the two groups.
>>> 
>>> -Jay
>>> 
>>> On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
>>> jeff.rodenburg@teamaol.com> wrote:
>>> 
>>>> Thanks for the info, Jun.
>>>> 
>>>>> if you just want each message to be consumed by a consumer, not a
>>>> particular one
>>>> 
>>>> What is intended to be a particular consumer? Something on the order of
>>>> Consumer #3 within a group needs message #123?
>>>> 
>>>> Ok, next question:
>>>> 
>>>> What is the relationship between topics and consumer groups? More to the
>>>> point, can I have multiple consumer groups that all consume the same
>> topic?
>>>> For example, assume a set of producers are publishing to the topic
>> "ABC".
>>>> Suppose I have multiple processes that take action on a given ABC
>> message
>>>> -- process 1 handles billing, process 2 handles file management,
>> process 3
>>>> handles history/archiving, etc.  Can I structure multiple groups that
>>>> consume the same topic? How does partitioning work at that point?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
>>>> 
>>>>> Jeff,
>>>>> 
>>>>> Your understanding is correct. Operational wise, we have some jmx that
>>>>> gives consumer stats per topic. There is also a tool CheckOffsetLag
>> that
>>>>> tells you how far behind a consumer is. For coordination btw producers
>>>> and
>>>>> consumers, if you just want each message to be consumed by a consumer,
>>>> not
>>>>> a particular one, there is no coordination needed.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Jun
>>>>> 
>>>>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
>>>> jeff.rodenburg@teamaol.com
>>>>>> wrote:
>>>>> 
>>>>>> Hi all -
>>>>>> 
>>>>>> Just getting familiar with Kafka, and learning about consumer groups.
>>>>>> Hoping someone can provide some context here.
>>>>>> 
>>>>>> As I understand it, consumers register with the broker and consume a
>>>>>> topic. Multiple consumers can consume a single topic, as a consumer
>>>> group.
>>>>>> Each consumer actually gets a partition of messages, so there is no
>>>> overlap
>>>>>> -- a single consumer within a group will receive a message on its
>>>>>> topic/partition.  Consumer rebalancing is the process whereby members
>>>> of a
>>>>>> consumer group are added and/or dropped from the group, and partitions
>>>> are
>>>>>> sorted/reassigned to the current consumer group members.
>>>>>> 
>>>>>> Some questions:
>>>>>> 
>>>>>> *   Is this accurate? What am I missing?
>>>>>> *   Operationally, is consumer "failover" basically service monitoring
>>>> at
>>>>>> the consumer process level?
>>>>>> *   How much coordination is required between producers and consumers
>>>>>> around partitioning? (Automated, configuration, etc.)
>>>>>> *   How are topics monitored for SLA on throughput/load, i.e. spinning
>>>> up
>>>>>> consumers as needed for topic message spikes?
>>>>>> 
>>>>>> Appreciate any further information and/or context anyone can share.
>>>>>> 
>>>>>> cheers,
>>>>>> Jeff
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Consumer group concept

Posted by Jay Kreps <ja...@gmail.com>.
I think a lot of these details are in the design doc, you may find that
helpful (http://incubator.apache.org/kafka/design.html).

To answer your question, it isn't the case that only one machine is
consuming. All machines in the group will consume. The way it works is that
each broker has some number of partitions. These partitions are divided up
over the consumer machines. The data in the partition is delivered in order
to whichever consumer is currently consuming that partition. Zookeeper is
used to balance the mapping of consumers to partitions. One consumer can
have many partitions, but if you have more consumers than partitions some
will not have any work to do.

-Jay

On Tue, Jun 12, 2012 at 1:55 PM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
> wrote:

> Great, I'm running the quick start and can see that in operation.
>
> Ok, last question on this thread:
>
> > So if you have two consumer groups consuming a topic, and each consumer
> group has 4 machines in it, then a message published to this topic would be
> delivered to one machine in each of the two groups.
>
> How is topic load-balancing for consumers handled?  For example, if a
> consumer group has 4 machines in it (consumer per machine), in reality only
> one machine in the group is actually working.  If I want multiple machines
> handling items in a topic, how is that approach handled? I could see
> producers generating more topics, and consumers subscribing to those
> (making a high-volume topic more granular).  What's best practice when
> consumer tasks on topic messages need to be handled by multiple consumers?
>
> -Jeff
>
>
>
>
>
> On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:
>
> > Basically the rule is this "every message sent to the topic is delivered
> to
> > one machine/process in each consumer group". So if you have two consumer
> > groups consuming a topic, and each consumer group has 4 machines in it,
> > then a message published to this topic would be delivered to one machine
> in
> > each of the two groups.
> >
> > -Jay
> >
> > On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
> > jeff.rodenburg@teamaol.com> wrote:
> >
> >> Thanks for the info, Jun.
> >>
> >>> if you just want each message to be consumed by a consumer, not a
> >> particular one
> >>
> >> What is intended to be a particular consumer? Something on the order of
> >> Consumer #3 within a group needs message #123?
> >>
> >> Ok, next question:
> >>
> >> What is the relationship between topics and consumer groups? More to the
> >> point, can I have multiple consumer groups that all consume the same
> topic?
> >> For example, assume a set of producers are publishing to the topic
> "ABC".
> >> Suppose I have multiple processes that take action on a given ABC
> message
> >> -- process 1 handles billing, process 2 handles file management,
> process 3
> >> handles history/archiving, etc.  Can I structure multiple groups that
> >> consume the same topic? How does partitioning work at that point?
> >>
> >>
> >>
> >>
> >> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
> >>
> >>> Jeff,
> >>>
> >>> Your understanding is correct. Operational wise, we have some jmx that
> >>> gives consumer stats per topic. There is also a tool CheckOffsetLag
> that
> >>> tells you how far behind a consumer is. For coordination btw producers
> >> and
> >>> consumers, if you just want each message to be consumed by a consumer,
> >> not
> >>> a particular one, there is no coordination needed.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
> >> jeff.rodenburg@teamaol.com
> >>>> wrote:
> >>>
> >>>> Hi all -
> >>>>
> >>>> Just getting familiar with Kafka, and learning about consumer groups.
> >>>> Hoping someone can provide some context here.
> >>>>
> >>>> As I understand it, consumers register with the broker and consume a
> >>>> topic. Multiple consumers can consume a single topic, as a consumer
> >> group.
> >>>> Each consumer actually gets a partition of messages, so there is no
> >> overlap
> >>>> -- a single consumer within a group will receive a message on its
> >>>> topic/partition.  Consumer rebalancing is the process whereby members
> >> of a
> >>>> consumer group are added and/or dropped from the group, and partitions
> >> are
> >>>> sorted/reassigned to the current consumer group members.
> >>>>
> >>>> Some questions:
> >>>>
> >>>> *   Is this accurate? What am I missing?
> >>>> *   Operationally, is consumer "failover" basically service monitoring
> >> at
> >>>> the consumer process level?
> >>>> *   How much coordination is required between producers and consumers
> >>>> around partitioning? (Automated, configuration, etc.)
> >>>> *   How are topics monitored for SLA on throughput/load, i.e. spinning
> >> up
> >>>> consumers as needed for topic message spikes?
> >>>>
> >>>> Appreciate any further information and/or context anyone can share.
> >>>>
> >>>> cheers,
> >>>> Jeff
> >>>>
> >>
> >>
>
>

Re: Consumer group concept

Posted by "Rodenburg, Jeff" <je...@teamaol.com>.
Great, I'm running the quick start and can see that in operation.

Ok, last question on this thread:

> So if you have two consumer groups consuming a topic, and each consumer group has 4 machines in it, then a message published to this topic would be delivered to one machine in each of the two groups.

How is topic load-balancing for consumers handled?  For example, if a consumer group has 4 machines in it (consumer per machine), in reality only one machine in the group is actually working.  If I want multiple machines handling items in a topic, how is that approach handled? I could see producers generating more topics, and consumers subscribing to those (making a high-volume topic more granular).  What's best practice when consumer tasks on topic messages need to be handled by multiple consumers?

-Jeff





On Jun 12, 2012, at 11:46 AM, Jay Kreps wrote:

> Basically the rule is this "every message sent to the topic is delivered to
> one machine/process in each consumer group". So if you have two consumer
> groups consuming a topic, and each consumer group has 4 machines in it,
> then a message published to this topic would be delivered to one machine in
> each of the two groups.
> 
> -Jay
> 
> On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
> jeff.rodenburg@teamaol.com> wrote:
> 
>> Thanks for the info, Jun.
>> 
>>> if you just want each message to be consumed by a consumer, not a
>> particular one
>> 
>> What is intended to be a particular consumer? Something on the order of
>> Consumer #3 within a group needs message #123?
>> 
>> Ok, next question:
>> 
>> What is the relationship between topics and consumer groups? More to the
>> point, can I have multiple consumer groups that all consume the same topic?
>> For example, assume a set of producers are publishing to the topic "ABC".
>> Suppose I have multiple processes that take action on a given ABC message
>> -- process 1 handles billing, process 2 handles file management, process 3
>> handles history/archiving, etc.  Can I structure multiple groups that
>> consume the same topic? How does partitioning work at that point?
>> 
>> 
>> 
>> 
>> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
>> 
>>> Jeff,
>>> 
>>> Your understanding is correct. Operational wise, we have some jmx that
>>> gives consumer stats per topic. There is also a tool CheckOffsetLag that
>>> tells you how far behind a consumer is. For coordination btw producers
>> and
>>> consumers, if you just want each message to be consumed by a consumer,
>> not
>>> a particular one, there is no coordination needed.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
>> jeff.rodenburg@teamaol.com
>>>> wrote:
>>> 
>>>> Hi all -
>>>> 
>>>> Just getting familiar with Kafka, and learning about consumer groups.
>>>> Hoping someone can provide some context here.
>>>> 
>>>> As I understand it, consumers register with the broker and consume a
>>>> topic. Multiple consumers can consume a single topic, as a consumer
>> group.
>>>> Each consumer actually gets a partition of messages, so there is no
>> overlap
>>>> -- a single consumer within a group will receive a message on its
>>>> topic/partition.  Consumer rebalancing is the process whereby members
>> of a
>>>> consumer group are added and/or dropped from the group, and partitions
>> are
>>>> sorted/reassigned to the current consumer group members.
>>>> 
>>>> Some questions:
>>>> 
>>>> *   Is this accurate? What am I missing?
>>>> *   Operationally, is consumer "failover" basically service monitoring
>> at
>>>> the consumer process level?
>>>> *   How much coordination is required between producers and consumers
>>>> around partitioning? (Automated, configuration, etc.)
>>>> *   How are topics monitored for SLA on throughput/load, i.e. spinning
>> up
>>>> consumers as needed for topic message spikes?
>>>> 
>>>> Appreciate any further information and/or context anyone can share.
>>>> 
>>>> cheers,
>>>> Jeff
>>>> 
>> 
>> 


Re: Consumer group concept

Posted by Jay Kreps <ja...@gmail.com>.
Basically the rule is this "every message sent to the topic is delivered to
one machine/process in each consumer group". So if you have two consumer
groups consuming a topic, and each consumer group has 4 machines in it,
then a message published to this topic would be delivered to one machine in
each of the two groups.

-Jay

On Tue, Jun 12, 2012 at 11:34 AM, Rodenburg, Jeff <
jeff.rodenburg@teamaol.com> wrote:

> Thanks for the info, Jun.
>
> >  if you just want each message to be consumed by a consumer, not a
> particular one
>
> What is intended to be a particular consumer? Something on the order of
> Consumer #3 within a group needs message #123?
>
> Ok, next question:
>
> What is the relationship between topics and consumer groups? More to the
> point, can I have multiple consumer groups that all consume the same topic?
>  For example, assume a set of producers are publishing to the topic "ABC".
>  Suppose I have multiple processes that take action on a given ABC message
> -- process 1 handles billing, process 2 handles file management, process 3
> handles history/archiving, etc.  Can I structure multiple groups that
> consume the same topic? How does partitioning work at that point?
>
>
>
>
> On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:
>
> > Jeff,
> >
> > Your understanding is correct. Operational wise, we have some jmx that
> > gives consumer stats per topic. There is also a tool CheckOffsetLag that
> > tells you how far behind a consumer is. For coordination btw producers
> and
> > consumers, if you just want each message to be consumed by a consumer,
> not
> > a particular one, there is no coordination needed.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <
> jeff.rodenburg@teamaol.com
> >> wrote:
> >
> >> Hi all -
> >>
> >> Just getting familiar with Kafka, and learning about consumer groups.
> >> Hoping someone can provide some context here.
> >>
> >> As I understand it, consumers register with the broker and consume a
> >> topic. Multiple consumers can consume a single topic, as a consumer
> group.
> >> Each consumer actually gets a partition of messages, so there is no
> overlap
> >> -- a single consumer within a group will receive a message on its
> >> topic/partition.  Consumer rebalancing is the process whereby members
> of a
> >> consumer group are added and/or dropped from the group, and partitions
> are
> >> sorted/reassigned to the current consumer group members.
> >>
> >> Some questions:
> >>
> >> *   Is this accurate? What am I missing?
> >> *   Operationally, is consumer "failover" basically service monitoring
> at
> >> the consumer process level?
> >> *   How much coordination is required between producers and consumers
> >> around partitioning? (Automated, configuration, etc.)
> >> *   How are topics monitored for SLA on throughput/load, i.e. spinning
> up
> >> consumers as needed for topic message spikes?
> >>
> >> Appreciate any further information and/or context anyone can share.
> >>
> >> cheers,
> >> Jeff
> >>
>
>

Re: Consumer group concept

Posted by "Rodenburg, Jeff" <je...@teamaol.com>.
Thanks for the info, Jun.

>  if you just want each message to be consumed by a consumer, not a particular one

What is intended to be a particular consumer? Something on the order of Consumer #3 within a group needs message #123?

Ok, next question:

What is the relationship between topics and consumer groups? More to the point, can I have multiple consumer groups that all consume the same topic?  For example, assume a set of producers are publishing to the topic "ABC".  Suppose I have multiple processes that take action on a given ABC message -- process 1 handles billing, process 2 handles file management, process 3 handles history/archiving, etc.  Can I structure multiple groups that consume the same topic? How does partitioning work at that point?




On Jun 12, 2012, at 10:11 AM, Jun Rao wrote:

> Jeff,
> 
> Your understanding is correct. Operational wise, we have some jmx that
> gives consumer stats per topic. There is also a tool CheckOffsetLag that
> tells you how far behind a consumer is. For coordination btw producers and
> consumers, if you just want each message to be consumed by a consumer, not
> a particular one, there is no coordination needed.
> 
> Thanks,
> 
> Jun
> 
> On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
>> wrote:
> 
>> Hi all -
>> 
>> Just getting familiar with Kafka, and learning about consumer groups.
>> Hoping someone can provide some context here.
>> 
>> As I understand it, consumers register with the broker and consume a
>> topic. Multiple consumers can consume a single topic, as a consumer group.
>> Each consumer actually gets a partition of messages, so there is no overlap
>> -- a single consumer within a group will receive a message on its
>> topic/partition.  Consumer rebalancing is the process whereby members of a
>> consumer group are added and/or dropped from the group, and partitions are
>> sorted/reassigned to the current consumer group members.
>> 
>> Some questions:
>> 
>> *   Is this accurate? What am I missing?
>> *   Operationally, is consumer "failover" basically service monitoring at
>> the consumer process level?
>> *   How much coordination is required between producers and consumers
>> around partitioning? (Automated, configuration, etc.)
>> *   How are topics monitored for SLA on throughput/load, i.e. spinning up
>> consumers as needed for topic message spikes?
>> 
>> Appreciate any further information and/or context anyone can share.
>> 
>> cheers,
>> Jeff
>> 


Re: Consumer group concept

Posted by Jun Rao <ju...@gmail.com>.
Jeff,

Your understanding is correct. Operational wise, we have some jmx that
gives consumer stats per topic. There is also a tool CheckOffsetLag that
tells you how far behind a consumer is. For coordination btw producers and
consumers, if you just want each message to be consumed by a consumer, not
a particular one, there is no coordination needed.

Thanks,

Jun

On Tue, Jun 12, 2012 at 9:58 AM, Rodenburg, Jeff <jeff.rodenburg@teamaol.com
> wrote:

> Hi all -
>
> Just getting familiar with Kafka, and learning about consumer groups.
> Hoping someone can provide some context here.
>
> As I understand it, consumers register with the broker and consume a
> topic. Multiple consumers can consume a single topic, as a consumer group.
> Each consumer actually gets a partition of messages, so there is no overlap
> -- a single consumer within a group will receive a message on its
> topic/partition.  Consumer rebalancing is the process whereby members of a
> consumer group are added and/or dropped from the group, and partitions are
> sorted/reassigned to the current consumer group members.
>
> Some questions:
>
>  *   Is this accurate? What am I missing?
>  *   Operationally, is consumer "failover" basically service monitoring at
> the consumer process level?
>  *   How much coordination is required between producers and consumers
> around partitioning? (Automated, configuration, etc.)
>  *   How are topics monitored for SLA on throughput/load, i.e. spinning up
> consumers as needed for topic message spikes?
>
> Appreciate any further information and/or context anyone can share.
>
> cheers,
> Jeff
>