You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Ross Black <ro...@gmail.com> on 2012/05/22 15:00:53 UTC

partitioning stateful consumers

Hi,

I am evaluating kafka to use in our application, and had some questions
about allocation of partitions to consumers.
We want to partition messages across a set of consumers so that ideally
each consumer handles a fixed set of ids (contained with the messages).
Each of our consumers maintains state for the set of ids it processes.

As I understand it, using a custom Partitioner will allow allocation of
messages to partitions, and then each consumer will be allocated one or
more partitions to process.   After a change to the number of brokers or
consumers the allocation of partitions to consumers will change so that
each consumer may now end up processing a different subset of messages.

Is there some facility within kafka that would allow the set of ids to
remain fixed for a particular consumer?  (From what I have read I assume
that this is not possible).

Alternatively is there any callback or other notification mechanism that
would allow our consumers to know when the partitioning changes?
Since each of our consumers maintains state for the set of ids it is
processing, we could then dump and refresh that state when the set of ids
change.

Thanks,
Ross

Re: partitioning stateful consumers

Posted by Ross Black <ro...@gmail.com>.
Hi,

Thanks, that info covers exactly what I was looking for.

Are Kafka-345, Kafka-346 intended to be released as part of 0.8, or perhaps
earlier?

I appreciate the help.
Ross



On 23 May 2012 08:24, Peter Romianowski <ho...@googlemail.com> wrote:

> Hi Ross,
>
> please have a look at Kafka-346, too. In combination with Kafka-345 our
> scenario, which should be a lot like yours, is covered. Both patches are
> applied to the github-branch hmb mentioned.
>
> Greetings
>
> Peter
> Am 22.05.2012 17:45 schrieb "Hisham Mardam-Bey" <hi...@mate1inc.com>:
>
> > Hi Ross,
> >
> > A similar thread[1] was just discussed on the list here and resulted in:
> >
> > https://issues.apache.org/jira/browse/KAFKA-345
> >
> > and
> >
> >
> >
> https://github.com/optivo-org/kafka/commit/c4b2647101ab857dda4cb831863dd37e5cb4df55
> >
> > [1]
> >
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201205.mbox/browser
> >
> > Hope this sheds some light on your question,
> >
> > Best,
> >
> > hmb.
> >
> > On Tue, May 22, 2012 at 9:00 AM, Ross Black <ro...@gmail.com>
> > wrote:
> > > Hi,
> > >
> > > I am evaluating kafka to use in our application, and had some questions
> > > about allocation of partitions to consumers.
> > > We want to partition messages across a set of consumers so that ideally
> > > each consumer handles a fixed set of ids (contained with the messages).
> > > Each of our consumers maintains state for the set of ids it processes.
> > >
> > > As I understand it, using a custom Partitioner will allow allocation of
> > > messages to partitions, and then each consumer will be allocated one or
> > > more partitions to process.   After a change to the number of brokers
> or
> > > consumers the allocation of partitions to consumers will change so that
> > > each consumer may now end up processing a different subset of messages.
> > >
> > > Is there some facility within kafka that would allow the set of ids to
> > > remain fixed for a particular consumer?  (From what I have read I
> assume
> > > that this is not possible).
> > >
> > > Alternatively is there any callback or other notification mechanism
> that
> > > would allow our consumers to know when the partitioning changes?
> > > Since each of our consumers maintains state for the set of ids it is
> > > processing, we could then dump and refresh that state when the set of
> ids
> > > change.
> > >
> > > Thanks,
> > > Ross
> >
> >
> >
> > --
> > Hisham Mardam-Bey
> > [ Director of Engineering ] [ Mate1 Inc. ]
> >
> > A: Because it messes up the order in which people normally read text.
> > Q: Why is top-posting such a bad thing?
> > A: Top-posting.
> > Q: What is the most annoying thing in e-mail?
> >
> > -=[ Codito Ergo Sum ]=-
> >
>

Re: partitioning stateful consumers

Posted by Peter Romianowski <ho...@googlemail.com>.
Hi Ross,

please have a look at Kafka-346, too. In combination with Kafka-345 our
scenario, which should be a lot like yours, is covered. Both patches are
applied to the github-branch hmb mentioned.

Greetings

Peter
Am 22.05.2012 17:45 schrieb "Hisham Mardam-Bey" <hi...@mate1inc.com>:

> Hi Ross,
>
> A similar thread[1] was just discussed on the list here and resulted in:
>
> https://issues.apache.org/jira/browse/KAFKA-345
>
> and
>
>
> https://github.com/optivo-org/kafka/commit/c4b2647101ab857dda4cb831863dd37e5cb4df55
>
> [1]
> http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201205.mbox/browser
>
> Hope this sheds some light on your question,
>
> Best,
>
> hmb.
>
> On Tue, May 22, 2012 at 9:00 AM, Ross Black <ro...@gmail.com>
> wrote:
> > Hi,
> >
> > I am evaluating kafka to use in our application, and had some questions
> > about allocation of partitions to consumers.
> > We want to partition messages across a set of consumers so that ideally
> > each consumer handles a fixed set of ids (contained with the messages).
> > Each of our consumers maintains state for the set of ids it processes.
> >
> > As I understand it, using a custom Partitioner will allow allocation of
> > messages to partitions, and then each consumer will be allocated one or
> > more partitions to process.   After a change to the number of brokers or
> > consumers the allocation of partitions to consumers will change so that
> > each consumer may now end up processing a different subset of messages.
> >
> > Is there some facility within kafka that would allow the set of ids to
> > remain fixed for a particular consumer?  (From what I have read I assume
> > that this is not possible).
> >
> > Alternatively is there any callback or other notification mechanism that
> > would allow our consumers to know when the partitioning changes?
> > Since each of our consumers maintains state for the set of ids it is
> > processing, we could then dump and refresh that state when the set of ids
> > change.
> >
> > Thanks,
> > Ross
>
>
>
> --
> Hisham Mardam-Bey
> [ Director of Engineering ] [ Mate1 Inc. ]
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
>
> -=[ Codito Ergo Sum ]=-
>

Re: partitioning stateful consumers

Posted by Hisham Mardam-Bey <hi...@mate1inc.com>.
Hi Ross,

A similar thread[1] was just discussed on the list here and resulted in:

https://issues.apache.org/jira/browse/KAFKA-345

and

https://github.com/optivo-org/kafka/commit/c4b2647101ab857dda4cb831863dd37e5cb4df55

[1] http://mail-archives.apache.org/mod_mbox/incubator-kafka-users/201205.mbox/browser

Hope this sheds some light on your question,

Best,

hmb.

On Tue, May 22, 2012 at 9:00 AM, Ross Black <ro...@gmail.com> wrote:
> Hi,
>
> I am evaluating kafka to use in our application, and had some questions
> about allocation of partitions to consumers.
> We want to partition messages across a set of consumers so that ideally
> each consumer handles a fixed set of ids (contained with the messages).
> Each of our consumers maintains state for the set of ids it processes.
>
> As I understand it, using a custom Partitioner will allow allocation of
> messages to partitions, and then each consumer will be allocated one or
> more partitions to process.   After a change to the number of brokers or
> consumers the allocation of partitions to consumers will change so that
> each consumer may now end up processing a different subset of messages.
>
> Is there some facility within kafka that would allow the set of ids to
> remain fixed for a particular consumer?  (From what I have read I assume
> that this is not possible).
>
> Alternatively is there any callback or other notification mechanism that
> would allow our consumers to know when the partitioning changes?
> Since each of our consumers maintains state for the set of ids it is
> processing, we could then dump and refresh that state when the set of ids
> change.
>
> Thanks,
> Ross



-- 
Hisham Mardam-Bey
[ Director of Engineering ] [ Mate1 Inc. ]

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

-=[ Codito Ergo Sum ]=-