You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Markus Roder <ro...@gmail.com> on 2013/10/08 11:11:49 UTC

Partition election on consumer

Hi,

we currently face a "problem" on our consumer cluster, which may have a
simple solution. Never the less I could not find this solution yet.

Description of problem:
1 kafka topic with 24 partitions (kafka version 0.8 Beta1
2 or more consumers in same consumer group. Each consumer processes its
partitions by aggregating topic data into a relational database. Each
consumer hashes the aggregation data locally for commiting data into the
relational database. After commit to database the consumerConnector commits
the offsets to kafka.

Problem is: If I connect a new consumer, the consumerconnector recalculates
the partitions to read from on each consumer instance. That causes our
system currently to process topic-data twice because of the local
aggregation within the consumer.

Is there any possibility to catch the event of new partition selection in
conumserConnector to commit the offsets and data before reconnecting to new
partitions?

Thanks in advance
Markus

-- 
Markus Roder
Distelweg 4
97318 Kitzingen
Mail: roder.markus80@gmail.com
Profil: http://gplus.to/markusroder

Re: Partition election on consumer

Posted by Markus Roder <ro...@gmail.com>.
Thanks Neha, really appreciate your assistance


2013/10/9 Neha Narkhede <ne...@gmail.com>

> Kafka's consumer rebalancing strategy is explained in detail here -
> http://kafka.apache.org/documentation.html#distributionimpl
> Hope that helps!
>
> -Neha
>
>
> On Tue, Oct 8, 2013 at 11:42 PM, Markus Roder <roder.markus80@gmail.com
> >wrote:
>
> > Hi Neha,
> >
> > thanks for this information.
> > Can you give me a hint for implementing a own rebalancing strategy?
> >
> > Thanks in advance
> > Markus
> >
> >
> > 2013/10/8 Neha Narkhede <ne...@gmail.com>
> >
> > > Currently there is no way to invoke a callback on the rebalance
> > operation.
> > > But this is certainly something to consider for Kafka 0.9 since we are
> > > planning a client rewrite for that release. You can find the proposal
> in
> > > progress here -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPI
> > >
> > > For now your best bet is to use the SimpleConsumer and implement your
> own
> > > rebalancing strategy. Another hacky approach is to register zookeeper
> > > watches on the /consumers/<group>/owners path that indicates the
> > partition
> > > ownership change.
> > >
> > > Thanks,
> > > Neha
> > > On Oct 8, 2013 2:12 AM, "Markus Roder" <ro...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > we currently face a "problem" on our consumer cluster, which may
> have a
> > > > simple solution. Never the less I could not find this solution yet.
> > > >
> > > > Description of problem:
> > > > 1 kafka topic with 24 partitions (kafka version 0.8 Beta1
> > > > 2 or more consumers in same consumer group. Each consumer processes
> its
> > > > partitions by aggregating topic data into a relational database. Each
> > > > consumer hashes the aggregation data locally for commiting data into
> > the
> > > > relational database. After commit to database the consumerConnector
> > > commits
> > > > the offsets to kafka.
> > > >
> > > > Problem is: If I connect a new consumer, the consumerconnector
> > > recalculates
> > > > the partitions to read from on each consumer instance. That causes
> our
> > > > system currently to process topic-data twice because of the local
> > > > aggregation within the consumer.
> > > >
> > > > Is there any possibility to catch the event of new partition
> selection
> > in
> > > > conumserConnector to commit the offsets and data before reconnecting
> to
> > > new
> > > > partitions?
> > > >
> > > > Thanks in advance
> > > > Markus
> > > >
> > > > --
> > > > Markus Roder
> > > > Distelweg 4
> > > > 97318 Kitzingen
> > > > Mail: roder.markus80@gmail.com
> > > > Profil: http://gplus.to/markusroder
> > > >
> > >
> >
> >
> >
> > --
> > Markus Roder
> > Distelweg 4
> > 97318 Kitzingen
> > Mail: roder.markus80@gmail.com
> > Profil: http://gplus.to/markusroder
> >
>



-- 
Markus Roder
Distelweg 4
97318 Kitzingen
Mail: roder.markus80@gmail.com
Profil: http://gplus.to/markusroder

Re: Partition election on consumer

Posted by Neha Narkhede <ne...@gmail.com>.
Kafka's consumer rebalancing strategy is explained in detail here -
http://kafka.apache.org/documentation.html#distributionimpl
Hope that helps!

-Neha


On Tue, Oct 8, 2013 at 11:42 PM, Markus Roder <ro...@gmail.com>wrote:

> Hi Neha,
>
> thanks for this information.
> Can you give me a hint for implementing a own rebalancing strategy?
>
> Thanks in advance
> Markus
>
>
> 2013/10/8 Neha Narkhede <ne...@gmail.com>
>
> > Currently there is no way to invoke a callback on the rebalance
> operation.
> > But this is certainly something to consider for Kafka 0.9 since we are
> > planning a client rewrite for that release. You can find the proposal in
> > progress here -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPI
> >
> > For now your best bet is to use the SimpleConsumer and implement your own
> > rebalancing strategy. Another hacky approach is to register zookeeper
> > watches on the /consumers/<group>/owners path that indicates the
> partition
> > ownership change.
> >
> > Thanks,
> > Neha
> > On Oct 8, 2013 2:12 AM, "Markus Roder" <ro...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > we currently face a "problem" on our consumer cluster, which may have a
> > > simple solution. Never the less I could not find this solution yet.
> > >
> > > Description of problem:
> > > 1 kafka topic with 24 partitions (kafka version 0.8 Beta1
> > > 2 or more consumers in same consumer group. Each consumer processes its
> > > partitions by aggregating topic data into a relational database. Each
> > > consumer hashes the aggregation data locally for commiting data into
> the
> > > relational database. After commit to database the consumerConnector
> > commits
> > > the offsets to kafka.
> > >
> > > Problem is: If I connect a new consumer, the consumerconnector
> > recalculates
> > > the partitions to read from on each consumer instance. That causes our
> > > system currently to process topic-data twice because of the local
> > > aggregation within the consumer.
> > >
> > > Is there any possibility to catch the event of new partition selection
> in
> > > conumserConnector to commit the offsets and data before reconnecting to
> > new
> > > partitions?
> > >
> > > Thanks in advance
> > > Markus
> > >
> > > --
> > > Markus Roder
> > > Distelweg 4
> > > 97318 Kitzingen
> > > Mail: roder.markus80@gmail.com
> > > Profil: http://gplus.to/markusroder
> > >
> >
>
>
>
> --
> Markus Roder
> Distelweg 4
> 97318 Kitzingen
> Mail: roder.markus80@gmail.com
> Profil: http://gplus.to/markusroder
>

Re: Partition election on consumer

Posted by Markus Roder <ro...@gmail.com>.
Hi Neha,

thanks for this information.
Can you give me a hint for implementing a own rebalancing strategy?

Thanks in advance
Markus


2013/10/8 Neha Narkhede <ne...@gmail.com>

> Currently there is no way to invoke a callback on the rebalance operation.
> But this is certainly something to consider for Kafka 0.9 since we are
> planning a client rewrite for that release. You can find the proposal in
> progress here -
>
> https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPI
>
> For now your best bet is to use the SimpleConsumer and implement your own
> rebalancing strategy. Another hacky approach is to register zookeeper
> watches on the /consumers/<group>/owners path that indicates the partition
> ownership change.
>
> Thanks,
> Neha
> On Oct 8, 2013 2:12 AM, "Markus Roder" <ro...@gmail.com> wrote:
>
> > Hi,
> >
> > we currently face a "problem" on our consumer cluster, which may have a
> > simple solution. Never the less I could not find this solution yet.
> >
> > Description of problem:
> > 1 kafka topic with 24 partitions (kafka version 0.8 Beta1
> > 2 or more consumers in same consumer group. Each consumer processes its
> > partitions by aggregating topic data into a relational database. Each
> > consumer hashes the aggregation data locally for commiting data into the
> > relational database. After commit to database the consumerConnector
> commits
> > the offsets to kafka.
> >
> > Problem is: If I connect a new consumer, the consumerconnector
> recalculates
> > the partitions to read from on each consumer instance. That causes our
> > system currently to process topic-data twice because of the local
> > aggregation within the consumer.
> >
> > Is there any possibility to catch the event of new partition selection in
> > conumserConnector to commit the offsets and data before reconnecting to
> new
> > partitions?
> >
> > Thanks in advance
> > Markus
> >
> > --
> > Markus Roder
> > Distelweg 4
> > 97318 Kitzingen
> > Mail: roder.markus80@gmail.com
> > Profil: http://gplus.to/markusroder
> >
>



-- 
Markus Roder
Distelweg 4
97318 Kitzingen
Mail: roder.markus80@gmail.com
Profil: http://gplus.to/markusroder

Re: Partition election on consumer

Posted by Neha Narkhede <ne...@gmail.com>.
Currently there is no way to invoke a callback on the rebalance operation.
But this is certainly something to consider for Kafka 0.9 since we are
planning a client rewrite for that release. You can find the proposal in
progress here -
https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPI

For now your best bet is to use the SimpleConsumer and implement your own
rebalancing strategy. Another hacky approach is to register zookeeper
watches on the /consumers/<group>/owners path that indicates the partition
ownership change.

Thanks,
Neha
On Oct 8, 2013 2:12 AM, "Markus Roder" <ro...@gmail.com> wrote:

> Hi,
>
> we currently face a "problem" on our consumer cluster, which may have a
> simple solution. Never the less I could not find this solution yet.
>
> Description of problem:
> 1 kafka topic with 24 partitions (kafka version 0.8 Beta1
> 2 or more consumers in same consumer group. Each consumer processes its
> partitions by aggregating topic data into a relational database. Each
> consumer hashes the aggregation data locally for commiting data into the
> relational database. After commit to database the consumerConnector commits
> the offsets to kafka.
>
> Problem is: If I connect a new consumer, the consumerconnector recalculates
> the partitions to read from on each consumer instance. That causes our
> system currently to process topic-data twice because of the local
> aggregation within the consumer.
>
> Is there any possibility to catch the event of new partition selection in
> conumserConnector to commit the offsets and data before reconnecting to new
> partitions?
>
> Thanks in advance
> Markus
>
> --
> Markus Roder
> Distelweg 4
> 97318 Kitzingen
> Mail: roder.markus80@gmail.com
> Profil: http://gplus.to/markusroder
>