You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Sybrandy, Casey" <Ca...@noviidesign.com> on 2011/11/17 19:11:53 UTC

Kafka Consumer Questions

Hello,

I have a couple questions about consumers.

1) What's the preferred method of writing a consumer?  The SimpleConsumer or the Zookeeper consumer?  I'm guessing the Zookeeper one allows the consumer to handle failures within the Kafka cluster.  E.g. If one node goes down, the consumer will then pull from the replicated node.

2) What's the proper way to track which messages have been processed by a consumer?  The scenario I'm looking at is if a consumer dies and we later restart it.  What we don't want happening is the consumer re-processing records that have already been processed.

What I'm basically looking for are best practices for setting up a system where we have to handle a high-volume of traffic.

Thanks.

Casey

Re: Kafka Consumer Questions

Posted by Joel Koshy <jj...@gmail.com>.
Casey,

You are right about the high-level (Zookeeper) consumer dealing with
broker failures - in which case a "rebalance" is triggered to evenly
re-allocate the consumption of partitions across the consumers in the
consumer group. Furthermore, if you set autocommit.enabled to true,
then it will commit the consumed offsets for each partition into
zookeeper at a configurable period. That should address your second
concern. If you use the SimpleConsumer on the other hand, you will
need to manually manage consumed offsets and deal with broker outages.

Thanks,

Joel

On Thu, Nov 17, 2011 at 10:11 AM, Sybrandy, Casey
<Ca...@noviidesign.com> wrote:
> Hello,
>
> I have a couple questions about consumers.
>
> 1) What's the preferred method of writing a consumer?  The SimpleConsumer or the Zookeeper consumer?  I'm guessing the Zookeeper one allows the consumer to handle failures within the Kafka cluster.  E.g. If one node goes down, the consumer will then pull from the replicated node.
>
> 2) What's the proper way to track which messages have been processed by a consumer?  The scenario I'm looking at is if a consumer dies and we later restart it.  What we don't want happening is the consumer re-processing records that have already been processed.
>
> What I'm basically looking for are best practices for setting up a system where we have to handle a high-volume of traffic.
>
> Thanks.
>
> Casey