You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Roshan Naik <ro...@hortonworks.com> on 2015/10/03 03:20:43 UTC

Re: New Flafka component - "kafka consumer" channel

Hari,

  Got some time to try out the 'parseAsFlumeEvent' option in the Kafka
channel. Basically I used it as a Kafka consumer.

  I am seeing about *140 MB/sec* with 1 NullSunk on a VM setup. (This was
on a VM ).
  I used 1000 byte events and Kafka broker was local.

This number is indeed promising and IMO makes Kafka channel a much more
performant alternative to KafkSource + File channel.

Have not yet tried to use KafkChannel as a producer (I.e alternative to
FC+ Kafka sink). 

I don't see a 'parseAsFlumeEvent' equivalent to enable Kafka channel to
write to Kafka without wrapping it in a FlumeEvent object.

-roshan




On 8/28/15 2:56 PM, "Roshan Naik" <ro...@hortonworks.com> wrote:

>OK that's really good to know. We won't need an additional component if it
>can function that way. Also in that case I would expect it to be quite
>fast.
>
>Will try to get some numbers next week. Glad I only spent a couple
>evenings on that prototype.
>
>-roshan
>
>
>On 8/27/15 5:56 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:
>
>>Nope. You can put anything you want, just set parseAsFlumeEvent to false
>>and the channel won't attempt to convert it into a Flume event. It just
>>stashes the whole thing into the body of the returned event.
>>
>>
>>Thanks,
>>Hari
>>
>>On Thu, Aug 27, 2015 at 5:53 PM, Roshan Naik <ro...@hortonworks.com>
>>wrote:
>>
>>> My understanding is that the Kafka channel expects "Flume Event"
>>>objects
>>> to be stored in the Kafka topic.
>>> Isn't that right ?
>>> -roshan
>>>
>>>
>>> On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com>
>>>wrote:
>>>
>>> >So one of the things that the already existing Kafka channel can do is
>>>to
>>> >run without a source. Does this outperform that as well? I have
>>>already
>>> >seen people use it this way.
>>> >
>>> >
>>> >Thanks,
>>> >Hari
>>> >
>>> >On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
>>> >wrote:
>>> >
>>> >> Wanted to give a heads-up on this idea I have been working on Š
>>> >>
>>> >> Using Flume as a Kafka producer or consumer has been gaining
>>>popularity
>>> >> thanks to the Flafka components that were recently introduced.
>>> >>
>>> >> For the use case of Flume as a Kafka consumer, it appears we can
>>> >>sidestep
>>> >> the compromise between Mem channel (which is fast but can lose data)
>>>and
>>> >>  File channel (which is slow but won't lose data) and get the best
>>>of
>>> >>both
>>> >> worlds.
>>> >>
>>> >> I have a prototype of this idea  for a "Kafka Consumer" channel.  It
>>>is
>>> >> designed to enable the use of Flume as a really light weight and
>>>very
>>> >>fast
>>> >> Kafka consumer without the data loss potential of mem channel.  My
>>> >> measurements indicate it easily outperforms memory channel.
>>> >>
>>> >> Additional info here  Š
>>> >> https://github.com/roshannaik/kafka-consumer-channel
>>> >>
>>> >> I think the same idea could be applied for "Kafka producer channel".
>>> >>
>>> >> -roshan
>>> >>
>>>
>>>
>
>


Re: New Flafka component - "kafka consumer" channel

Posted by Gonzalo Herreros <gh...@gmail.com>.
With this patch: https://issues.apache.org/jira/browse/FLUME-2781
I'm using a kafka channel as a regular Flume channel with a sink but in
addition any kafka client can tap into the messages without the hassle of
multiplexing.
This is very convenient to provide a Flume http interface to clients I
don't control, so they don't have to worry about updating the kafka
libraries or when it gets kerberized.

Regards,
Gonzalo
On Oct 3, 2015 2:21 AM, "Roshan Naik" <ro...@hortonworks.com> wrote:

> Hari,
>
>   Got some time to try out the 'parseAsFlumeEvent' option in the Kafka
> channel. Basically I used it as a Kafka consumer.
>
>   I am seeing about *140 MB/sec* with 1 NullSunk on a VM setup. (This was
> on a VM ).
>   I used 1000 byte events and Kafka broker was local.
>
> This number is indeed promising and IMO makes Kafka channel a much more
> performant alternative to KafkSource + File channel.
>
> Have not yet tried to use KafkChannel as a producer (I.e alternative to
> FC+ Kafka sink).
>
> I don't see a 'parseAsFlumeEvent' equivalent to enable Kafka channel to
> write to Kafka without wrapping it in a FlumeEvent object.
>
> -roshan
>
>
>
>
> On 8/28/15 2:56 PM, "Roshan Naik" <ro...@hortonworks.com> wrote:
>
> >OK that's really good to know. We won't need an additional component if it
> >can function that way. Also in that case I would expect it to be quite
> >fast.
> >
> >Will try to get some numbers next week. Glad I only spent a couple
> >evenings on that prototype.
> >
> >-roshan
> >
> >
> >On 8/27/15 5:56 PM, "Hari Shreedharan" <hs...@cloudera.com> wrote:
> >
> >>Nope. You can put anything you want, just set parseAsFlumeEvent to false
> >>and the channel won't attempt to convert it into a Flume event. It just
> >>stashes the whole thing into the body of the returned event.
> >>
> >>
> >>Thanks,
> >>Hari
> >>
> >>On Thu, Aug 27, 2015 at 5:53 PM, Roshan Naik <ro...@hortonworks.com>
> >>wrote:
> >>
> >>> My understanding is that the Kafka channel expects "Flume Event"
> >>>objects
> >>> to be stored in the Kafka topic.
> >>> Isn't that right ?
> >>> -roshan
> >>>
> >>>
> >>> On 8/27/15 5:47 PM, "Hari Shreedharan" <hs...@cloudera.com>
> >>>wrote:
> >>>
> >>> >So one of the things that the already existing Kafka channel can do is
> >>>to
> >>> >run without a source. Does this outperform that as well? I have
> >>>already
> >>> >seen people use it this way.
> >>> >
> >>> >
> >>> >Thanks,
> >>> >Hari
> >>> >
> >>> >On Thu, Aug 27, 2015 at 4:11 PM, Roshan Naik <ro...@hortonworks.com>
> >>> >wrote:
> >>> >
> >>> >> Wanted to give a heads-up on this idea I have been working on Š
> >>> >>
> >>> >> Using Flume as a Kafka producer or consumer has been gaining
> >>>popularity
> >>> >> thanks to the Flafka components that were recently introduced.
> >>> >>
> >>> >> For the use case of Flume as a Kafka consumer, it appears we can
> >>> >>sidestep
> >>> >> the compromise between Mem channel (which is fast but can lose data)
> >>>and
> >>> >>  File channel (which is slow but won't lose data) and get the best
> >>>of
> >>> >>both
> >>> >> worlds.
> >>> >>
> >>> >> I have a prototype of this idea  for a "Kafka Consumer" channel.  It
> >>>is
> >>> >> designed to enable the use of Flume as a really light weight and
> >>>very
> >>> >>fast
> >>> >> Kafka consumer without the data loss potential of mem channel.  My
> >>> >> measurements indicate it easily outperforms memory channel.
> >>> >>
> >>> >> Additional info here  Š
> >>> >> https://github.com/roshannaik/kafka-consumer-channel
> >>> >>
> >>> >> I think the same idea could be applied for "Kafka producer channel".
> >>> >>
> >>> >> -roshan
> >>> >>
> >>>
> >>>
> >
> >
>
>