You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Mario Ds Briggs <ma...@in.ibm.com> on 2015/12/03 18:30:38 UTC

Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API


Hi,

Wanted to pick Cody's mind on what he thinks about
DirectKafkaInputDStream/KafkaRDD internally using the new Kafka consumer
API. I know the latter is documented as beta-quality, but yet wanted to
know if he sees any blockers as to why shouldn't go there shortly. On my
side the consideration is that kafka 0.9.0.0 introduced Authentication and
Encryption (beta again) between clients & brokers, but this is available
only newer Consumer API's and not in the older Low-level/High-level API's.

>From briefly studying the implementation of
DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking is that
it is possible to support the exact current implementation you have using
the new API's.
 One area that isnt so straightforward was the ctor of KafkaRDD fixes the
offsetRange (I did read about the deterministic feature you were after) and
i couldnt find a direct method in the new Consumer API to get the current
'latest' offset - however one can do a consumer.seekToEnd() and then call a
consumer.position().
 Of course one other benefit is that the new Consumer API's abstracts away
having to deal with finding the leader for a partition, so can get rid of
that code

Would be great to get your thoughts.

thanks in advance
Mario

Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API

Posted by Mario Ds Briggs <ma...@in.ibm.com>.

sounds sane for a first cut.

Since all creation methods take a KafkaParams, i was thinking along lines
of maybe  a temp property in there which trigger usage of new consumer.

thanks
Mario



From:	Cody Koeninger <co...@koeninger.org>
To:	Mario Ds Briggs/India/IBM@IBMIN
Cc:	"dev@spark.apache.org" <de...@spark.apache.org>
Date:	04/12/2015 08:45 pm
Subject:	Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the
            new Kafka Consumer API



Brute force way to do it might be to just have a separate
streaming-kafka-new-consumer subproject, or something along those lines.

On Fri, Dec 4, 2015 at 3:12 AM, Mario Ds Briggs <ma...@in.ibm.com>
wrote:
  >>
  forcing people on kafka 8.x to upgrade their brokers is questionable.
  <<

  I agree and i was more thinking maybe there is a way to support both for
  a period of time (of course means some more code to maintain :-)).


  thanks
  Mario

  Inactive hide details for Cody Koeninger ---04/12/2015 12:15:55
  am---Honestly my feeling on any new API is to wait for a point Cody
  Koeninger ---04/12/2015 12:15:55 am---Honestly my feeling on any new API
  is to wait for a point release before taking it seriously :)

  From: Cody Koeninger <co...@koeninger.org>
  To: Mario Ds Briggs/India/IBM@IBMIN
  Cc: "dev@spark.apache.org" <de...@spark.apache.org>
  Date: 04/12/2015 12:15 am
  Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the
  new Kafka Consumer API



  Honestly my feeling on any new API is to wait for a point release before
  taking it seriously :)

  Auth and encryption seem like the only compelling reason to move, but
  forcing people on kafka 8.x to upgrade their brokers is questionable.

  On Thu, Dec 3, 2015 at 11:30 AM, Mario Ds Briggs <mario.briggs@in.ibm.com
  > wrote:
        Hi,

        Wanted to pick Cody's mind on what he thinks about
        DirectKafkaInputDStream/KafkaRDD internally using the new Kafka
        consumer API. I know the latter is documented as beta-quality, but
        yet wanted to know if he sees any blockers as to why shouldn't go
        there shortly. On my side the consideration is that kafka 0.9.0.0
        introduced Authentication and Encryption (beta again) between
        clients & brokers, but this is available only newer Consumer API's
        and not in the older Low-level/High-level API's.

        From briefly studying the implementation of
        DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking
        is that it is possible to support the exact current implementation
        you have using the new API's.
        One area that isnt so straightforward was the ctor of KafkaRDD
        fixes the offsetRange (I did read about the deterministic feature
        you were after) and i couldnt find a direct method in the new
        Consumer API to get the current 'latest' offset - however one can
        do a consumer.seekToEnd() and then call a consumer.position().
        Of course one other benefit is that the new Consumer API's
        abstracts away having to deal with finding the leader for a
        partition, so can get rid of that code

        Would be great to get your thoughts.

        thanks in advance
        Mario













Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API

Posted by Cody Koeninger <co...@koeninger.org>.
Brute force way to do it might be to just have a separate
streaming-kafka-new-consumer subproject, or something along those lines.

On Fri, Dec 4, 2015 at 3:12 AM, Mario Ds Briggs <ma...@in.ibm.com>
wrote:

> >>
> forcing people on kafka 8.x to upgrade their brokers is questionable.
> <<
>
> I agree and i was more thinking maybe there is a way to support both for a
> period of time (of course means some more code to maintain :-)).
>
>
> thanks
> Mario
>
> [image: Inactive hide details for Cody Koeninger ---04/12/2015 12:15:55
> am---Honestly my feeling on any new API is to wait for a point]Cody
> Koeninger ---04/12/2015 12:15:55 am---Honestly my feeling on any new API is
> to wait for a point release before taking it seriously :)
>
> From: Cody Koeninger <co...@koeninger.org>
> To: Mario Ds Briggs/India/IBM@IBMIN
> Cc: "dev@spark.apache.org" <de...@spark.apache.org>
> Date: 04/12/2015 12:15 am
> Subject: Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the
> new Kafka Consumer API
> ------------------------------
>
>
>
> Honestly my feeling on any new API is to wait for a point release before
> taking it seriously :)
>
> Auth and encryption seem like the only compelling reason to move, but
> forcing people on kafka 8.x to upgrade their brokers is questionable.
>
> On Thu, Dec 3, 2015 at 11:30 AM, Mario Ds Briggs <
> *mario.briggs@in.ibm.com* <ma...@in.ibm.com>> wrote:
>
>    Hi,
>
>    Wanted to pick Cody's mind on what he thinks about
>    DirectKafkaInputDStream/KafkaRDD internally using the new Kafka consumer
>    API. I know the latter is documented as beta-quality, but yet wanted to
>    know if he sees any blockers as to why shouldn't go there shortly. On my
>    side the consideration is that kafka 0.9.0.0 introduced Authentication and
>    Encryption (beta again) between clients & brokers, but this is available
>    only newer Consumer API's and not in the older Low-level/High-level API's.
>
>    From briefly studying the implementation of
>    DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking is that
>    it is possible to support the exact current implementation you have using
>    the new API's.
>    One area that isnt so straightforward was the ctor of KafkaRDD fixes
>    the offsetRange (I did read about the deterministic feature you were after)
>    and i couldnt find a direct method in the new Consumer API to get the
>    current 'latest' offset - however one can do a consumer.seekToEnd() and
>    then call a consumer.position().
>    Of course one other benefit is that the new Consumer API's abstracts
>    away having to deal with finding the leader for a partition, so can get rid
>    of that code
>
>    Would be great to get your thoughts.
>
>    thanks in advance
>    Mario
>
>
>
>

Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API

Posted by Mario Ds Briggs <ma...@in.ibm.com>.
>>
 forcing people on kafka 8.x to upgrade their brokers is questionable.
<<

I agree and i was more thinking maybe there is a way to support both for a
period of time (of course means some more code to maintain :-)).


thanks
Mario



From:	Cody Koeninger <co...@koeninger.org>
To:	Mario Ds Briggs/India/IBM@IBMIN
Cc:	"dev@spark.apache.org" <de...@spark.apache.org>
Date:	04/12/2015 12:15 am
Subject:	Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the
            new Kafka Consumer API



Honestly my feeling on any new API is to wait for a point release before
taking it seriously :)

Auth and encryption seem like the only compelling reason to move, but
forcing people on kafka 8.x to upgrade their brokers is questionable.

On Thu, Dec 3, 2015 at 11:30 AM, Mario Ds Briggs <ma...@in.ibm.com>
wrote:
  Hi,

  Wanted to pick Cody's mind on what he thinks about
  DirectKafkaInputDStream/KafkaRDD internally using the new Kafka consumer
  API. I know the latter is documented as beta-quality, but yet wanted to
  know if he sees any blockers as to why shouldn't go there shortly. On my
  side the consideration is that kafka 0.9.0.0 introduced Authentication
  and Encryption (beta again) between clients & brokers, but this is
  available only newer Consumer API's and not in the older
  Low-level/High-level API's.

  From briefly studying the implementation of
  DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking is
  that it is possible to support the exact current implementation you have
  using the new API's.
  One area that isnt so straightforward was the ctor of KafkaRDD fixes the
  offsetRange (I did read about the deterministic feature you were after)
  and i couldnt find a direct method in the new Consumer API to get the
  current 'latest' offset - however one can do a consumer.seekToEnd() and
  then call a consumer.position().
  Of course one other benefit is that the new Consumer API's abstracts away
  having to deal with finding the leader for a partition, so can get rid of
  that code

  Would be great to get your thoughts.

  thanks in advance
  Mario









Re: Spark Streaming Kafka - DirectKafkaInputDStream: Using the new Kafka Consumer API

Posted by Cody Koeninger <co...@koeninger.org>.
Honestly my feeling on any new API is to wait for a point release before
taking it seriously :)

Auth and encryption seem like the only compelling reason to move, but
forcing people on kafka 8.x to upgrade their brokers is questionable.

On Thu, Dec 3, 2015 at 11:30 AM, Mario Ds Briggs <ma...@in.ibm.com>
wrote:

> Hi,
>
> Wanted to pick Cody's mind on what he thinks about
> DirectKafkaInputDStream/KafkaRDD internally using the new Kafka consumer
> API. I know the latter is documented as beta-quality, but yet wanted to
> know if he sees any blockers as to why shouldn't go there shortly. On my
> side the consideration is that kafka 0.9.0.0 introduced Authentication and
> Encryption (beta again) between clients & brokers, but this is available
> only newer Consumer API's and not in the older Low-level/High-level API's.
>
> From briefly studying the implementation of
> DirectKafkaInputDStream/KafkaRDD and new Consumer API, my thinking is that
> it is possible to support the exact current implementation you have using
> the new API's.
> One area that isnt so straightforward was the ctor of KafkaRDD fixes the
> offsetRange (I did read about the deterministic feature you were after) and
> i couldnt find a direct method in the new Consumer API to get the current
> 'latest' offset - however one can do a consumer.seekToEnd() and then call a
> consumer.position().
> Of course one other benefit is that the new Consumer API's abstracts away
> having to deal with finding the leader for a partition, so can get rid of
> that code
>
> Would be great to get your thoughts.
>
> thanks in advance
> Mario
>