You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Roshan Naik <ro...@hortonworks.com> on 2015/08/28 01:11:49 UTC

New Flafka component - "kafka consumer" channel

Wanted to give a heads-up on this idea I have been working on ...

Using Flume as a Kafka producer or consumer has been gaining popularity thanks to the Flafka components that were recently introduced.

For the use case of Flume as a Kafka consumer, it appears we can sidestep the compromise between Mem channel (which is fast but can lose data) and  File channel (which is slow but won't lose data) and get the best of both worlds.

I have a prototype of this idea  for a "Kafka Consumer" channel.  It is designed to enable the use of Flume as a really light weight and very fast Kafka consumer without the data loss potential of mem channel.  My measurements indicate it easily outperforms memory channel.

Additional info here  ...
https://github.com/roshannaik/kafka-consumer-channel

I think the same idea could be applied for "Kafka producer channel".

-roshan

Re: New Flafka component - "kafka consumer" channel

Posted by Gonzalo Herreros <gh...@gmail.com>.
With this patch: https://issues.apache.org/jira/browse/FLUME-2781
I'm using a kafka channel as a regular Flume channel with a sink but in
addition any kafka client can tap into the messages without the hassle of
multiplexing.
This is very convenient to provide a Flume http interface to clients I
don't control, so they don't have to worry about updating the kafka
libraries or when it gets kerberized.

Regards,
Gonzalo
On Oct 3, 2015 2:21 AM, "Roshan Naik" <ro...@hortonworks.com> wrote:

> Hari,
>
>   Got some time to try out the 'parseAsFlumeEvent' option in the Kafka
> channel. Basically I used it as a Kafka consumer.
>
>   I am seeing about *140 MB/sec* with 1 NullSunk on a VM setup. (This was
> on a VM ).
>   I used 1000 byte events and Kafka broker was local.
>
> This number is indeed promising and IMO makes Kafka channel a much more
> performant alternative to KafkSource + File channel.
>
> Have not yet tried to use KafkChannel as a producer (I.e alternative to
> FC+ Kafka sink).
>
> I don't see a 'parseAsFlumeEvent' equivalent to enable Kafka channel to
> write to Kafka without wrapping it in a FlumeEvent object.
>
> -roshan
>
>
>
>
> On 8/28/15 2:56 PM, "Roshan Naik" <ro...@hortonworks.com> wrote:
>
> >OK that's really good to know. We won't need an additional component if it
> >can function that way. Also in that case I would expect it to be quite
> >fast.
> >
> >Will try to get some numbers next week. Glad I only spent a couple
> >evenings on that prototype.
> >
> >-roshan
> >
> >
> >On 8/27/15 5:56 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:
> >
> >>Nope. You can put anything you want, just set parseAsFlumeEvent to false
> >>and the channel won't attempt to convert it into a Flume event. It just
> >>stashes the whole thing into the body of the returned event.
> >>
> >>
> >>Thanks,
> >>Hari
> >>
> >>On Thu, Aug 27, 2015 at 5:53 PM, Roshan Naik <ro...@hortonworks.com>
> >>wrote:
> >>
> >>> My understanding is that the Kafka channel expects "Flume Event"
> >>>objects
> >>> to be stored in the Kafka topic.
> >>> Isn't that right ?
> >>> -roshan
> >>>
> >>>
> >>> On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com>
> >>>wrote:
> >>>
> >>> >So one of the things that the already existing Kafka channel can do is
> >>>to
> >>> >run without a source. Does this outperform that as well? I have
> >>>already
> >>> >seen people use it this way.
> >>> >
> >>> >
> >>> >Thanks,
> >>> >Hari
> >>> >
> >>> >On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
> >>> >wrote:
> >>> >
> >>> >> Wanted to give a heads-up on this idea I have been working on Š
> >>> >>
> >>> >> Using Flume as a Kafka producer or consumer has been gaining
> >>>popularity
> >>> >> thanks to the Flafka components that were recently introduced.
> >>> >>
> >>> >> For the use case of Flume as a Kafka consumer, it appears we can
> >>> >>sidestep
> >>> >> the compromise between Mem channel (which is fast but can lose data)
> >>>and
> >>> >>  File channel (which is slow but won't lose data) and get the best
> >>>of
> >>> >>both
> >>> >> worlds.
> >>> >>
> >>> >> I have a prototype of this idea  for a "Kafka Consumer" channel.  It
> >>>is
> >>> >> designed to enable the use of Flume as a really light weight and
> >>>very
> >>> >>fast
> >>> >> Kafka consumer without the data loss potential of mem channel.  My
> >>> >> measurements indicate it easily outperforms memory channel.
> >>> >>
> >>> >> Additional info here  Š
> >>> >> https://github.com/roshannaik/kafka-consumer-channel
> >>> >>
> >>> >> I think the same idea could be applied for "Kafka producer channel".
> >>> >>
> >>> >> -roshan
> >>> >>
> >>>
> >>>
> >
> >
>
>

Re: New Flafka component - "kafka consumer" channel

Posted by Roshan Naik <ro...@hortonworks.com>.
Hari,

  Got some time to try out the 'parseAsFlumeEvent' option in the Kafka
channel. Basically I used it as a Kafka consumer.

  I am seeing about *140 MB/sec* with 1 NullSunk on a VM setup. (This was
on a VM ).
  I used 1000 byte events and Kafka broker was local.

This number is indeed promising and IMO makes Kafka channel a much more
performant alternative to KafkSource + File channel.

Have not yet tried to use KafkChannel as a producer (I.e alternative to
FC+ Kafka sink). 

I don't see a 'parseAsFlumeEvent' equivalent to enable Kafka channel to
write to Kafka without wrapping it in a FlumeEvent object.

-roshan




On 8/28/15 2:56 PM, "Roshan Naik" <ro...@hortonworks.com> wrote:

>OK that's really good to know. We won't need an additional component if it
>can function that way. Also in that case I would expect it to be quite
>fast.
>
>Will try to get some numbers next week. Glad I only spent a couple
>evenings on that prototype.
>
>-roshan
>
>
>On 8/27/15 5:56 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:
>
>>Nope. You can put anything you want, just set parseAsFlumeEvent to false
>>and the channel won't attempt to convert it into a Flume event. It just
>>stashes the whole thing into the body of the returned event.
>>
>>
>>Thanks,
>>Hari
>>
>>On Thu, Aug 27, 2015 at 5:53 PM, Roshan Naik <ro...@hortonworks.com>
>>wrote:
>>
>>> My understanding is that the Kafka channel expects "Flume Event"
>>>objects
>>> to be stored in the Kafka topic.
>>> Isn't that right ?
>>> -roshan
>>>
>>>
>>> On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com>
>>>wrote:
>>>
>>> >So one of the things that the already existing Kafka channel can do is
>>>to
>>> >run without a source. Does this outperform that as well? I have
>>>already
>>> >seen people use it this way.
>>> >
>>> >
>>> >Thanks,
>>> >Hari
>>> >
>>> >On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
>>> >wrote:
>>> >
>>> >> Wanted to give a heads-up on this idea I have been working on Š
>>> >>
>>> >> Using Flume as a Kafka producer or consumer has been gaining
>>>popularity
>>> >> thanks to the Flafka components that were recently introduced.
>>> >>
>>> >> For the use case of Flume as a Kafka consumer, it appears we can
>>> >>sidestep
>>> >> the compromise between Mem channel (which is fast but can lose data)
>>>and
>>> >>  File channel (which is slow but won't lose data) and get the best
>>>of
>>> >>both
>>> >> worlds.
>>> >>
>>> >> I have a prototype of this idea  for a "Kafka Consumer" channel.  It
>>>is
>>> >> designed to enable the use of Flume as a really light weight and
>>>very
>>> >>fast
>>> >> Kafka consumer without the data loss potential of mem channel.  My
>>> >> measurements indicate it easily outperforms memory channel.
>>> >>
>>> >> Additional info here  Š
>>> >> https://github.com/roshannaik/kafka-consumer-channel
>>> >>
>>> >> I think the same idea could be applied for "Kafka producer channel".
>>> >>
>>> >> -roshan
>>> >>
>>>
>>>
>
>


Re: New Flafka component - "kafka consumer" channel

Posted by Roshan Naik <ro...@hortonworks.com>.
OK that's really good to know. We won't need an additional component if it
can function that way. Also in that case I would expect it to be quite
fast.

Will try to get some numbers next week. Glad I only spent a couple
evenings on that prototype.

-roshan


On 8/27/15 5:56 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:

>Nope. You can put anything you want, just set parseAsFlumeEvent to false
>and the channel won't attempt to convert it into a Flume event. It just
>stashes the whole thing into the body of the returned event.
>
>
>Thanks,
>Hari
>
>On Thu, Aug 27, 2015 at 5:53 PM, Roshan Naik <ro...@hortonworks.com>
>wrote:
>
>> My understanding is that the Kafka channel expects "Flume Event" objects
>> to be stored in the Kafka topic.
>> Isn't that right ?
>> -roshan
>>
>>
>> On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com>
>>wrote:
>>
>> >So one of the things that the already existing Kafka channel can do is
>>to
>> >run without a source. Does this outperform that as well? I have already
>> >seen people use it this way.
>> >
>> >
>> >Thanks,
>> >Hari
>> >
>> >On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
>> >wrote:
>> >
>> >> Wanted to give a heads-up on this idea I have been working on Š
>> >>
>> >> Using Flume as a Kafka producer or consumer has been gaining
>>popularity
>> >> thanks to the Flafka components that were recently introduced.
>> >>
>> >> For the use case of Flume as a Kafka consumer, it appears we can
>> >>sidestep
>> >> the compromise between Mem channel (which is fast but can lose data)
>>and
>> >>  File channel (which is slow but won't lose data) and get the best of
>> >>both
>> >> worlds.
>> >>
>> >> I have a prototype of this idea  for a "Kafka Consumer" channel.  It
>>is
>> >> designed to enable the use of Flume as a really light weight and very
>> >>fast
>> >> Kafka consumer without the data loss potential of mem channel.  My
>> >> measurements indicate it easily outperforms memory channel.
>> >>
>> >> Additional info here  Š
>> >> https://github.com/roshannaik/kafka-consumer-channel
>> >>
>> >> I think the same idea could be applied for "Kafka producer channel".
>> >>
>> >> -roshan
>> >>
>>
>>


Re: New Flafka component - "kafka consumer" channel

Posted by Roshan Naik <ro...@hortonworks.com>.
OK that's really good to know. We won't need an additional component if it
can function that way. Also in that case I would expect it to be quite
fast.

Will try to get some numbers next week. Glad I only spent a couple
evenings on that prototype.

-roshan


On 8/27/15 5:56 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:

>Nope. You can put anything you want, just set parseAsFlumeEvent to false
>and the channel won't attempt to convert it into a Flume event. It just
>stashes the whole thing into the body of the returned event.
>
>
>Thanks,
>Hari
>
>On Thu, Aug 27, 2015 at 5:53 PM, Roshan Naik <ro...@hortonworks.com>
>wrote:
>
>> My understanding is that the Kafka channel expects "Flume Event" objects
>> to be stored in the Kafka topic.
>> Isn't that right ?
>> -roshan
>>
>>
>> On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com>
>>wrote:
>>
>> >So one of the things that the already existing Kafka channel can do is
>>to
>> >run without a source. Does this outperform that as well? I have already
>> >seen people use it this way.
>> >
>> >
>> >Thanks,
>> >Hari
>> >
>> >On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
>> >wrote:
>> >
>> >> Wanted to give a heads-up on this idea I have been working on Š
>> >>
>> >> Using Flume as a Kafka producer or consumer has been gaining
>>popularity
>> >> thanks to the Flafka components that were recently introduced.
>> >>
>> >> For the use case of Flume as a Kafka consumer, it appears we can
>> >>sidestep
>> >> the compromise between Mem channel (which is fast but can lose data)
>>and
>> >>  File channel (which is slow but won't lose data) and get the best of
>> >>both
>> >> worlds.
>> >>
>> >> I have a prototype of this idea  for a "Kafka Consumer" channel.  It
>>is
>> >> designed to enable the use of Flume as a really light weight and very
>> >>fast
>> >> Kafka consumer without the data loss potential of mem channel.  My
>> >> measurements indicate it easily outperforms memory channel.
>> >>
>> >> Additional info here  Š
>> >> https://github.com/roshannaik/kafka-consumer-channel
>> >>
>> >> I think the same idea could be applied for "Kafka producer channel".
>> >>
>> >> -roshan
>> >>
>>
>>


Re: New Flafka component - "kafka consumer" channel

Posted by Hari Shreedharan <hs...@cloudera.com>.
Nope. You can put anything you want, just set parseAsFlumeEvent to false
and the channel won't attempt to convert it into a Flume event. It just
stashes the whole thing into the body of the returned event.


Thanks,
Hari

On Thu, Aug 27, 2015 at 5:53 PM, Roshan Naik <ro...@hortonworks.com> wrote:

> My understanding is that the Kafka channel expects "Flume Event" objects
> to be stored in the Kafka topic.
> Isn't that right ?
> -roshan
>
>
> On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:
>
> >So one of the things that the already existing Kafka channel can do is to
> >run without a source. Does this outperform that as well? I have already
> >seen people use it this way.
> >
> >
> >Thanks,
> >Hari
> >
> >On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
> >wrote:
> >
> >> Wanted to give a heads-up on this idea I have been working on Š
> >>
> >> Using Flume as a Kafka producer or consumer has been gaining popularity
> >> thanks to the Flafka components that were recently introduced.
> >>
> >> For the use case of Flume as a Kafka consumer, it appears we can
> >>sidestep
> >> the compromise between Mem channel (which is fast but can lose data) and
> >>  File channel (which is slow but won't lose data) and get the best of
> >>both
> >> worlds.
> >>
> >> I have a prototype of this idea  for a "Kafka Consumer" channel.  It is
> >> designed to enable the use of Flume as a really light weight and very
> >>fast
> >> Kafka consumer without the data loss potential of mem channel.  My
> >> measurements indicate it easily outperforms memory channel.
> >>
> >> Additional info here  Š
> >> https://github.com/roshannaik/kafka-consumer-channel
> >>
> >> I think the same idea could be applied for "Kafka producer channel".
> >>
> >> -roshan
> >>
>
>

Re: New Flafka component - "kafka consumer" channel

Posted by Hari Shreedharan <hs...@cloudera.com>.
Nope. You can put anything you want, just set parseAsFlumeEvent to false
and the channel won't attempt to convert it into a Flume event. It just
stashes the whole thing into the body of the returned event.


Thanks,
Hari

On Thu, Aug 27, 2015 at 5:53 PM, Roshan Naik <ro...@hortonworks.com> wrote:

> My understanding is that the Kafka channel expects "Flume Event" objects
> to be stored in the Kafka topic.
> Isn't that right ?
> -roshan
>
>
> On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:
>
> >So one of the things that the already existing Kafka channel can do is to
> >run without a source. Does this outperform that as well? I have already
> >seen people use it this way.
> >
> >
> >Thanks,
> >Hari
> >
> >On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
> >wrote:
> >
> >> Wanted to give a heads-up on this idea I have been working on Š
> >>
> >> Using Flume as a Kafka producer or consumer has been gaining popularity
> >> thanks to the Flafka components that were recently introduced.
> >>
> >> For the use case of Flume as a Kafka consumer, it appears we can
> >>sidestep
> >> the compromise between Mem channel (which is fast but can lose data) and
> >>  File channel (which is slow but won't lose data) and get the best of
> >>both
> >> worlds.
> >>
> >> I have a prototype of this idea  for a "Kafka Consumer" channel.  It is
> >> designed to enable the use of Flume as a really light weight and very
> >>fast
> >> Kafka consumer without the data loss potential of mem channel.  My
> >> measurements indicate it easily outperforms memory channel.
> >>
> >> Additional info here  Š
> >> https://github.com/roshannaik/kafka-consumer-channel
> >>
> >> I think the same idea could be applied for "Kafka producer channel".
> >>
> >> -roshan
> >>
>
>

Re: New Flafka component - "kafka consumer" channel

Posted by Roshan Naik <ro...@hortonworks.com>.
My understanding is that the Kafka channel expects "Flume Event" objects
to be stored in the Kafka topic.
Isn't that right ?
-roshan


On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:

>So one of the things that the already existing Kafka channel can do is to
>run without a source. Does this outperform that as well? I have already
>seen people use it this way.
>
>
>Thanks,
>Hari
>
>On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
>wrote:
>
>> Wanted to give a heads-up on this idea I have been working on Š
>>
>> Using Flume as a Kafka producer or consumer has been gaining popularity
>> thanks to the Flafka components that were recently introduced.
>>
>> For the use case of Flume as a Kafka consumer, it appears we can
>>sidestep
>> the compromise between Mem channel (which is fast but can lose data) and
>>  File channel (which is slow but won't lose data) and get the best of
>>both
>> worlds.
>>
>> I have a prototype of this idea  for a "Kafka Consumer" channel.  It is
>> designed to enable the use of Flume as a really light weight and very
>>fast
>> Kafka consumer without the data loss potential of mem channel.  My
>> measurements indicate it easily outperforms memory channel.
>>
>> Additional info here  Š
>> https://github.com/roshannaik/kafka-consumer-channel
>>
>> I think the same idea could be applied for "Kafka producer channel".
>>
>> -roshan
>>


Re: New Flafka component - "kafka consumer" channel

Posted by Roshan Naik <ro...@hortonworks.com>.
My understanding is that the Kafka channel expects "Flume Event" objects
to be stored in the Kafka topic.
Isn't that right ?
-roshan


On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:

>So one of the things that the already existing Kafka channel can do is to
>run without a source. Does this outperform that as well? I have already
>seen people use it this way.
>
>
>Thanks,
>Hari
>
>On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
>wrote:
>
>> Wanted to give a heads-up on this idea I have been working on Š
>>
>> Using Flume as a Kafka producer or consumer has been gaining popularity
>> thanks to the Flafka components that were recently introduced.
>>
>> For the use case of Flume as a Kafka consumer, it appears we can
>>sidestep
>> the compromise between Mem channel (which is fast but can lose data) and
>>  File channel (which is slow but won't lose data) and get the best of
>>both
>> worlds.
>>
>> I have a prototype of this idea  for a "Kafka Consumer" channel.  It is
>> designed to enable the use of Flume as a really light weight and very
>>fast
>> Kafka consumer without the data loss potential of mem channel.  My
>> measurements indicate it easily outperforms memory channel.
>>
>> Additional info here  Š
>> https://github.com/roshannaik/kafka-consumer-channel
>>
>> I think the same idea could be applied for "Kafka producer channel".
>>
>> -roshan
>>


Re: New Flafka component - "kafka consumer" channel

Posted by Hari Shreedharan <hs...@cloudera.com>.
So one of the things that the already existing Kafka channel can do is to
run without a source. Does this outperform that as well? I have already
seen people use it this way.


Thanks,
Hari

On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com> wrote:

> Wanted to give a heads-up on this idea I have been working on …
>
> Using Flume as a Kafka producer or consumer has been gaining popularity
> thanks to the Flafka components that were recently introduced.
>
> For the use case of Flume as a Kafka consumer, it appears we can sidestep
> the compromise between Mem channel (which is fast but can lose data) and
>  File channel (which is slow but won't lose data) and get the best of both
> worlds.
>
> I have a prototype of this idea  for a "Kafka Consumer" channel.  It is
> designed to enable the use of Flume as a really light weight and very fast
> Kafka consumer without the data loss potential of mem channel.  My
> measurements indicate it easily outperforms memory channel.
>
> Additional info here  …
> https://github.com/roshannaik/kafka-consumer-channel
>
> I think the same idea could be applied for "Kafka producer channel".
>
> -roshan
>

Re: New Flafka component - "kafka consumer" channel

Posted by Hari Shreedharan <hs...@cloudera.com>.
So one of the things that the already existing Kafka channel can do is to
run without a source. Does this outperform that as well? I have already
seen people use it this way.


Thanks,
Hari

On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com> wrote:

> Wanted to give a heads-up on this idea I have been working on …
>
> Using Flume as a Kafka producer or consumer has been gaining popularity
> thanks to the Flafka components that were recently introduced.
>
> For the use case of Flume as a Kafka consumer, it appears we can sidestep
> the compromise between Mem channel (which is fast but can lose data) and
>  File channel (which is slow but won't lose data) and get the best of both
> worlds.
>
> I have a prototype of this idea  for a "Kafka Consumer" channel.  It is
> designed to enable the use of Flume as a really light weight and very fast
> Kafka consumer without the data loss potential of mem channel.  My
> measurements indicate it easily outperforms memory channel.
>
> Additional info here  …
> https://github.com/roshannaik/kafka-consumer-channel
>
> I think the same idea could be applied for "Kafka producer channel".
>
> -roshan
>