You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Kashyap Mhaisekar <ka...@gmail.com> on 2014/03/24 04:11:28 UTC
Kafka high level consumer in storm
Hi,
Is there any downside to using Kafka high level consumer as spout? I plan
to spawn threads to read from various partitions of the topic in Kafka.
Regards,
Kashyap
Re: Kafka high level consumer in storm
Posted by Kashyap Mhaisekar <ka...@gmail.com>.
Thanks Mattijs. I understand from your mail that there is no issue using HL
consumer except for the loss of flexibility.
Regards,
Kashyap
On Mon, Mar 24, 2014 at 3:20 AM, Mattijs Ugen <ma...@holmes.nl> wrote:
> Is there any downside to using Kafka high level consumer as spout?
>>
> The main downside of the high level consumer is that you won't be able to
> control exactly when it will request a broker for more data and that it
> will always commit the latest offset you read from the stream it provides.
> In a somewhat continuous stream of messages, the first part won't matter
> much, you can tweak all of the client properties listed on the kafka site.
> The latter part becomes somewhat complicated when you need to be able to
> replay messages that fail within your topology, assuming you don't want
> your client to commit anything that hasn't been ack()'d to the spout.
>
> There's up and downsides to using either the low or high level client,
> feel free to examine the differences between https://github.com/
> wurstmeister/storm-kafka-0.8-plus (low level client) and
> https://github.com/HolmesNL/kafka-spout/ (high level client).
>
>
> I plan to spawn threads to read from various partitions of the topic in
>> Kafka.
>>
> I reckon that won't be necessary from storm; storm will manage the
> parallelism for you in terms of multiple spout instances among your
> clusters. As long as you put spouts in the same consumer group, they'll
> together consume all partitions (even a single client will switch
> partitions now and then to ensure all are read).
>
> Kind regards,
>
> Mattijs
>
>
Re: Kafka high level consumer in storm
Posted by Mattijs Ugen <ma...@holmes.nl>.
> Is there any downside to using Kafka high level consumer as spout?
The main downside of the high level consumer is that you won't be able
to control exactly when it will request a broker for more data and that
it will always commit the latest offset you read from the stream it
provides. In a somewhat continuous stream of messages, the first part
won't matter much, you can tweak all of the client properties listed on
the kafka site. The latter part becomes somewhat complicated when you
need to be able to replay messages that fail within your topology,
assuming you don't want your client to commit anything that hasn't been
ack()'d to the spout.
There's up and downsides to using either the low or high level client,
feel free to examine the differences between
https://github.com/wurstmeister/storm-kafka-0.8-plus (low level client)
and https://github.com/HolmesNL/kafka-spout/ (high level client).
> I plan to spawn threads to read from various partitions of the topic in Kafka.
I reckon that won't be necessary from storm; storm will manage the
parallelism for you in terms of multiple spout instances among your
clusters. As long as you put spouts in the same consumer group, they'll
together consume all partitions (even a single client will switch
partitions now and then to ensure all are read).
Kind regards,
Mattijs