You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mukesh Jha <me...@gmail.com> on 2016/09/13 23:46:01 UTC

Spark kafka integration issues

Hello fellow sparkers,

I'm using spark to consume messages from kafka in a non streaming fashion.
I'm suing the using spark-streaming-kafka-0-8_2.10 & sparkv2.0to do the
same.

I have a few queries for the same, please get back if you guys have clues
on the same.

1) Is there anyway to get the have the topic and partition & offset
information for each item from the KafkaRDD. I'm using the
*KafkaUtils.createRDD[String,
String, StringDecoder, StringDecoder]* to create my kafka RDD.
2) How to pass my custom Decoder instead of using the String or Byte
decoder are there any examples for the same?
3) is there a newer version to consumer from kafka-0.10 & kafka-0.9 clusters

-- 
Thanks & Regards,

*Mukesh Jha <me...@gmail.com>*

Re: Spark kafka integration issues

Posted by Cody Koeninger <co...@koeninger.org>.
Yeah, an updated version of that blog post is available at

https://github.com/koeninger/kafka-exactly-once

On Wed, Sep 14, 2016 at 11:35 AM, Mukesh Jha <me...@gmail.com> wrote:
> Thanks for the reply Cody.
>
> I found the below article on the same, very helpful. Thanks for the details,
> much appreciated.
>
> http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/
>
> On Tue, Sep 13, 2016 at 8:14 PM, Cody Koeninger <co...@koeninger.org> wrote:
>>
>> 1.  see
>> http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
>>  look for HasOffsetRange.  If you really want the info per-message
>> rather than per-partition, createRDD has an overload that takes a
>> messageHandler from MessageAndMetadata to whatever you need
>>
>> 2. createRDD takes type parameters for the key and value decoder, so
>> specify them there
>>
>> 3. you can use spark-streaming-kafka-0-8 against 0.9 or 0.10 brokers.
>> There is a spark-streaming-kafka-0-10 package with additional features
>> that only works on brokers 0.10 or higher.  A pull request for
>> documenting it has been merged, but not deployed.
>>
>> On Tue, Sep 13, 2016 at 6:46 PM, Mukesh Jha <me...@gmail.com>
>> wrote:
>> > Hello fellow sparkers,
>> >
>> > I'm using spark to consume messages from kafka in a non streaming
>> > fashion.
>> > I'm suing the using spark-streaming-kafka-0-8_2.10 & sparkv2.0to do the
>> > same.
>> >
>> > I have a few queries for the same, please get back if you guys have
>> > clues on
>> > the same.
>> >
>> > 1) Is there anyway to get the have the topic and partition & offset
>> > information for each item from the KafkaRDD. I'm using the
>> > KafkaUtils.createRDD[String, String, StringDecoder, StringDecoder] to
>> > create
>> > my kafka RDD.
>> > 2) How to pass my custom Decoder instead of using the String or Byte
>> > decoder
>> > are there any examples for the same?
>> > 3) is there a newer version to consumer from kafka-0.10 & kafka-0.9
>> > clusters
>> >
>> > --
>> > Thanks & Regards,
>> >
>> > Mukesh Jha
>
>
>
>
> --
>
>
> Thanks & Regards,
>
> Mukesh Jha

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Spark kafka integration issues

Posted by Mukesh Jha <me...@gmail.com>.
Thanks for the reply Cody.

I found the below article on the same, very helpful. Thanks for the
details, much appreciated.

http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/

On Tue, Sep 13, 2016 at 8:14 PM, Cody Koeninger <co...@koeninger.org> wrote:

> 1.  see http://spark.apache.org/docs/latest/streaming-kafka-
> integration.html#approach-2-direct-approach-no-receivers
>  look for HasOffsetRange.  If you really want the info per-message
> rather than per-partition, createRDD has an overload that takes a
> messageHandler from MessageAndMetadata to whatever you need
>
> 2. createRDD takes type parameters for the key and value decoder, so
> specify them there
>
> 3. you can use spark-streaming-kafka-0-8 against 0.9 or 0.10 brokers.
> There is a spark-streaming-kafka-0-10 package with additional features
> that only works on brokers 0.10 or higher.  A pull request for
> documenting it has been merged, but not deployed.
>
> On Tue, Sep 13, 2016 at 6:46 PM, Mukesh Jha <me...@gmail.com>
> wrote:
> > Hello fellow sparkers,
> >
> > I'm using spark to consume messages from kafka in a non streaming
> fashion.
> > I'm suing the using spark-streaming-kafka-0-8_2.10 & sparkv2.0to do the
> > same.
> >
> > I have a few queries for the same, please get back if you guys have
> clues on
> > the same.
> >
> > 1) Is there anyway to get the have the topic and partition & offset
> > information for each item from the KafkaRDD. I'm using the
> > KafkaUtils.createRDD[String, String, StringDecoder, StringDecoder] to
> create
> > my kafka RDD.
> > 2) How to pass my custom Decoder instead of using the String or Byte
> decoder
> > are there any examples for the same?
> > 3) is there a newer version to consumer from kafka-0.10 & kafka-0.9
> clusters
> >
> > --
> > Thanks & Regards,
> >
> > Mukesh Jha
>



-- 


Thanks & Regards,

*Mukesh Jha <me...@gmail.com>*

Re: Spark kafka integration issues

Posted by Cody Koeninger <co...@koeninger.org>.
1.  see http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers
 look for HasOffsetRange.  If you really want the info per-message
rather than per-partition, createRDD has an overload that takes a
messageHandler from MessageAndMetadata to whatever you need

2. createRDD takes type parameters for the key and value decoder, so
specify them there

3. you can use spark-streaming-kafka-0-8 against 0.9 or 0.10 brokers.
There is a spark-streaming-kafka-0-10 package with additional features
that only works on brokers 0.10 or higher.  A pull request for
documenting it has been merged, but not deployed.

On Tue, Sep 13, 2016 at 6:46 PM, Mukesh Jha <me...@gmail.com> wrote:
> Hello fellow sparkers,
>
> I'm using spark to consume messages from kafka in a non streaming fashion.
> I'm suing the using spark-streaming-kafka-0-8_2.10 & sparkv2.0to do the
> same.
>
> I have a few queries for the same, please get back if you guys have clues on
> the same.
>
> 1) Is there anyway to get the have the topic and partition & offset
> information for each item from the KafkaRDD. I'm using the
> KafkaUtils.createRDD[String, String, StringDecoder, StringDecoder] to create
> my kafka RDD.
> 2) How to pass my custom Decoder instead of using the String or Byte decoder
> are there any examples for the same?
> 3) is there a newer version to consumer from kafka-0.10 & kafka-0.9 clusters
>
> --
> Thanks & Regards,
>
> Mukesh Jha

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org