You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Kashyap Mhaisekar <ka...@gmail.com> on 2014/03/24 04:11:28 UTC

Kafka high level consumer in storm

Hi,
Is there any downside to using Kafka high level consumer as spout? I plan
to spawn threads to read from various partitions of the topic in Kafka.
Regards,
Kashyap

Re: Kafka high level consumer in storm

Posted by Kashyap Mhaisekar <ka...@gmail.com>.
Thanks Mattijs. I understand from your mail that there is no issue using HL
consumer except for the loss of flexibility.

Regards,
Kashyap


On Mon, Mar 24, 2014 at 3:20 AM, Mattijs Ugen <ma...@holmes.nl> wrote:

> Is there any downside to using Kafka high level consumer as spout?
>>
> The main downside of the high level consumer is that you won't be able to
> control exactly when it will request a broker for more data and that it
> will always commit the latest offset you read from the stream it provides.
> In a somewhat continuous stream of messages, the first part won't matter
> much, you can tweak all of the client properties listed on the kafka site.
> The latter part becomes somewhat complicated when you need to be able to
> replay messages that fail within your topology, assuming you don't want
> your client to commit anything that hasn't been ack()'d to the spout.
>
> There's up and downsides to using either the low or high level client,
> feel free to examine the differences between https://github.com/
> wurstmeister/storm-kafka-0.8-plus (low level client) and
> https://github.com/HolmesNL/kafka-spout/ (high level client).
>
>
>  I plan to spawn threads to read from various partitions of the topic in
>> Kafka.
>>
> I reckon that won't be necessary from storm; storm will manage the
> parallelism for you in terms of multiple spout instances among your
> clusters. As long as you put spouts in the same consumer group, they'll
> together consume all partitions (even a single client will switch
> partitions now and then to ensure all are read).
>
> Kind regards,
>
> Mattijs
>
>

Re: Kafka high level consumer in storm

Posted by Mattijs Ugen <ma...@holmes.nl>.
> Is there any downside to using Kafka high level consumer as spout?
The main downside of the high level consumer is that you won't be able 
to control exactly when it will request a broker for more data and that 
it will always commit the latest offset you read from the stream it 
provides. In a somewhat continuous stream of messages, the first part 
won't matter much, you can tweak all of the client properties listed on 
the kafka site. The latter part becomes somewhat complicated when you 
need to be able to replay messages that fail within your topology, 
assuming you don't want your client to commit anything that hasn't been 
ack()'d to the spout.

There's up and downsides to using either the low or high level client, 
feel free to examine the differences between 
https://github.com/wurstmeister/storm-kafka-0.8-plus (low level client) 
and https://github.com/HolmesNL/kafka-spout/ (high level client).

> I plan to spawn threads to read from various partitions of the topic in Kafka.
I reckon that won't be necessary from storm; storm will manage the 
parallelism for you in terms of multiple spout instances among your 
clusters. As long as you put spouts in the same consumer group, they'll 
together consume all partitions (even a single client will switch 
partitions now and then to ensure all are read).

Kind regards,

Mattijs