You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Sandon Jacobs <sj...@appia.com> on 2014/03/06 03:27:34 UTC

Consumer Multi-Fetch

I understand replication uses a multi-fetch concept to maintain the replicas of each partition. I have a use case where it might be beneficial to grab a “batch” of messages from a kafka topic and process them as one unit into a source system – in my use case, sending the messages to a Flume source.

My questions:

  *   Is it possible to fetch a back of messages in which you may not know the exact message size?
  *   If so, how are the offsets managed?

I am trying to avoid queuing them in memory and batching in my process for several reasons.

Thanks in advance…

Re: Consumer Multi-Fetch

Posted by Joel Koshy <jj...@gmail.com>.
On Thu, Mar 06, 2014 at 02:27:34AM +0000, Sandon Jacobs wrote:
> I understand replication uses a multi-fetch concept to maintain the replicas of each partition. I have a use case where it might be beneficial to grab a “batch” of messages from a kafka topic and process them as one unit into a source system – in my use case, sending the messages to a Flume source.
> 
> My questions:
> 
>   *   Is it possible to fetch a back of messages in which you may not know the exact message size?

The high-level consumer actually uses multi-fetch. You will need to
have some idea of the max message size and set your fetch size
accordingly. Unfortunately if you are consuming a very large number of
topics this can increase the memory requirements of the consumer.  We
intend to address this in the consumer re-write - there is a separate
design review thread on that.

>   *   If so, how are the offsets managed?

The consumer essentially pre-fetches and queues the chunks in memory
and the offsets are not incremented/check-pointed until the
application thread actually iterates over the messsages.

> I am trying to avoid queuing them in memory and batching in my process for several reasons.

The high-level consumer does queuing as described above, but you can
reduce the number of queued chunks.

Joel