You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nipun Arora <ni...@gmail.com> on 2016/02/10 20:28:15 UTC

Kafka + Spark 1.3 Integration

Hi,

I am trying some basic integration and was going through the manual.

I would like to read from a topic, and get a JavaReceiverInputDStream
<String> for messages in that topic. However the example is of
JavaPairReceiverInputDStream<>. How do I get a stream for only a single
topic in Java?

Reference Page:
https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html

 import org.apache.spark.streaming.kafka.*;

 JavaPairReceiverInputDStream<String, String> kafkaStream =
     KafkaUtils.createStream(streamingContext,
     [ZK quorum], [consumer group id], [per-topic number of Kafka
partitions to consume]);


Also in the example above what does <String,String> signify?

Thanks
Nipun

Re: Kafka + Spark 1.3 Integration

Posted by Cody Koeninger <co...@koeninger.org>.
That's what the kafkaParams argument is for.  Not all of the kafka
configuration parameters will be relevant, though.

On Thu, Feb 11, 2016 at 12:07 PM, Nipun Arora <ni...@gmail.com>
wrote:

> Hi ,
>
> Thanks for the explanation and the example link. Got it working.
> A follow up question. In Kafka one can define properties as follows:
>
> Properties props = new Properties();
> props.put("zookeeper.connect", zookeeper);
> props.put("group.id", groupId);
> props.put("zookeeper.session.timeout.ms", "500");
> props.put("zookeeper.sync.time.ms", "250");
> props.put("auto.commit.interval.ms", "1000");
>
>
> How can I do the same for the receiver inside spark-streaming for Spark V1.3.1
>
>
> Thanks
>
> Nipun
>
>
>
> On Wed, Feb 10, 2016 at 3:59 PM Cody Koeninger <co...@koeninger.org> wrote:
>
>> It's a pair because there's a key and value for each message.
>>
>> If you just want a single topic, put a single topic in the map of topic
>> -> number of partitions.
>>
>> See
>>
>>
>> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
>>
>>
>> On Wed, Feb 10, 2016 at 1:28 PM, Nipun Arora <ni...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying some basic integration and was going through the manual.
>>>
>>> I would like to read from a topic, and get a JavaReceiverInputDStream
>>> <String> for messages in that topic. However the example is of
>>> JavaPairReceiverInputDStream<>. How do I get a stream for only a single
>>> topic in Java?
>>>
>>> Reference Page:
>>> https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
>>>
>>>  import org.apache.spark.streaming.kafka.*;
>>>
>>>  JavaPairReceiverInputDStream<String, String> kafkaStream =
>>>      KafkaUtils.createStream(streamingContext,
>>>      [ZK quorum], [consumer group id], [per-topic number of Kafka partitions to consume]);
>>>
>>>
>>> Also in the example above what does <String,String> signify?
>>>
>>> Thanks
>>> Nipun
>>>
>>
>>

Re: Kafka + Spark 1.3 Integration

Posted by Nipun Arora <ni...@gmail.com>.
Hi ,

Thanks for the explanation and the example link. Got it working.
A follow up question. In Kafka one can define properties as follows:

Properties props = new Properties();
props.put("zookeeper.connect", zookeeper);
props.put("group.id", groupId);
props.put("zookeeper.session.timeout.ms", "500");
props.put("zookeeper.sync.time.ms", "250");
props.put("auto.commit.interval.ms", "1000");


How can I do the same for the receiver inside spark-streaming for Spark V1.3.1


Thanks

Nipun



On Wed, Feb 10, 2016 at 3:59 PM Cody Koeninger <co...@koeninger.org> wrote:

> It's a pair because there's a key and value for each message.
>
> If you just want a single topic, put a single topic in the map of topic ->
> number of partitions.
>
> See
>
>
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java
>
>
> On Wed, Feb 10, 2016 at 1:28 PM, Nipun Arora <ni...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am trying some basic integration and was going through the manual.
>>
>> I would like to read from a topic, and get a JavaReceiverInputDStream
>> <String> for messages in that topic. However the example is of
>> JavaPairReceiverInputDStream<>. How do I get a stream for only a single
>> topic in Java?
>>
>> Reference Page:
>> https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
>>
>>  import org.apache.spark.streaming.kafka.*;
>>
>>  JavaPairReceiverInputDStream<String, String> kafkaStream =
>>      KafkaUtils.createStream(streamingContext,
>>      [ZK quorum], [consumer group id], [per-topic number of Kafka partitions to consume]);
>>
>>
>> Also in the example above what does <String,String> signify?
>>
>> Thanks
>> Nipun
>>
>
>

Re: Kafka + Spark 1.3 Integration

Posted by Cody Koeninger <co...@koeninger.org>.
It's a pair because there's a key and value for each message.

If you just want a single topic, put a single topic in the map of topic ->
number of partitions.

See

https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaKafkaWordCount.java


On Wed, Feb 10, 2016 at 1:28 PM, Nipun Arora <ni...@gmail.com>
wrote:

> Hi,
>
> I am trying some basic integration and was going through the manual.
>
> I would like to read from a topic, and get a JavaReceiverInputDStream
> <String> for messages in that topic. However the example is of
> JavaPairReceiverInputDStream<>. How do I get a stream for only a single
> topic in Java?
>
> Reference Page:
> https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
>
>  import org.apache.spark.streaming.kafka.*;
>
>  JavaPairReceiverInputDStream<String, String> kafkaStream =
>      KafkaUtils.createStream(streamingContext,
>      [ZK quorum], [consumer group id], [per-topic number of Kafka partitions to consume]);
>
>
> Also in the example above what does <String,String> signify?
>
> Thanks
> Nipun
>