You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Sandon Jacobs <sj...@appia.com> on 2014/03/06 03:27:34 UTC
Consumer Multi-Fetch
I understand replication uses a multi-fetch concept to maintain the replicas of each partition. I have a use case where it might be beneficial to grab a “batch” of messages from a kafka topic and process them as one unit into a source system – in my use case, sending the messages to a Flume source.
My questions:
* Is it possible to fetch a back of messages in which you may not know the exact message size?
* If so, how are the offsets managed?
I am trying to avoid queuing them in memory and batching in my process for several reasons.
Thanks in advance…
Re: Consumer Multi-Fetch
Posted by Joel Koshy <jj...@gmail.com>.
On Thu, Mar 06, 2014 at 02:27:34AM +0000, Sandon Jacobs wrote:
> I understand replication uses a multi-fetch concept to maintain the replicas of each partition. I have a use case where it might be beneficial to grab a “batch” of messages from a kafka topic and process them as one unit into a source system – in my use case, sending the messages to a Flume source.
>
> My questions:
>
> * Is it possible to fetch a back of messages in which you may not know the exact message size?
The high-level consumer actually uses multi-fetch. You will need to
have some idea of the max message size and set your fetch size
accordingly. Unfortunately if you are consuming a very large number of
topics this can increase the memory requirements of the consumer. We
intend to address this in the consumer re-write - there is a separate
design review thread on that.
> * If so, how are the offsets managed?
The consumer essentially pre-fetches and queues the chunks in memory
and the offsets are not incremented/check-pointed until the
application thread actually iterates over the messsages.
> I am trying to avoid queuing them in memory and batching in my process for several reasons.
The high-level consumer does queuing as described above, but you can
reduce the number of queued chunks.
Joel