You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Karim, Md. Rezaul" <re...@insight-centre.org> on 2016/11/16 18:18:21 UTC

Need guidelines in Spark Streaming and Kafka integration

Hi All,

I am completely new with Kafka. I was wondering if somebody could provide
me some guidelines on how to develop real-time streaming applications using
Spark Streaming API with Kafka.

I am aware the Spark Streaming  and Kafka integration [1]. However, a real
life example should be better to start?



1. http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html





Regards,
_________________________________
*Md. Rezaul Karim* BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
<http://139.59.184.114/index.html>

Re: Need guidelines in Spark Streaming and Kafka integration

Posted by "Karim, Md. Rezaul" <re...@insight-centre.org>.
Hi Tariq and Jon,

At first thanks for quick response. I really appreciate that.

Well, I would like to start from the very begging of using Kafka with
Spark. For example, in the Spark distribution, I found an example using
Kafka with Spark streaming that demonstrates a Direct Kafka Word Count
example. In that example, I found the main class
*JavaDirectKafkaWordCount.java* under the
spark-2.0.0-bin-hadoop2.7\examples\src\main\java\org\apache\spark\examples\streaming
directory) that contains a code segment as follows:


---------------------------------------*-------------------------------------------------
    String brokers = args[0];
    String topics = args[1];

    // Create context with a 2 seconds batch interval
    SparkConf sparkConf = new
SparkConf().setAppName("JavaDirectKafkaWordCount").setMaster("local[*]");
    JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
Durations.seconds(20));

    Set<String> topicsSet = new HashSet<>(Arrays.asList(topics.split(",")));
    Map<String, String> kafkaParams = new HashMap<>();
    kafkaParams.put("metadata.broker.list", brokers);
---------------------------------------*-------------------------------------------------

In this code block, the confusing part is setting the values of two command
line arguments (i.e., *brokers *and *topics*). I tried to set them as
follows:

    String brokers = "localhost:8890,localhost:8892";
    String topics = " topic1,topic2";

However, I know this is not the right way to do so. But there has to have
the correct ways of setting the value of the brokers and topics.

Now, the thing is that I need help how to set/configure these two
parameters so that I can run this hello world like example successfully.
Any kind of help would be highly appreciated.




Regards,
_________________________________
*Md. Rezaul Karim* BSc, MSc
PhD Researcher, INSIGHT Centre for Data Analytics
National University of Ireland, Galway
IDA Business Park, Dangan, Galway, Ireland
Web: http://www.reza-analytics.eu/index.html
<http://139.59.184.114/index.html>

On 17 November 2016 at 03:08, Jon Gregg <jo...@gmail.com> wrote:

> Since you're completely new to Kafka, I would start with the Kafka docs (
> https://kafka.apache.org/documentation).  You should be able to get
> through the Getting Started part easily and there are some examples for
> setting up a basic Kafka server.
>
> You don't need Kafka to start working with Spark Streaming (there are
> examples online to pull directly from Twitter, for example).  But at a high
> level if you're sending data from one server to another, it can be
> beneficial to send the messages to a distributed queue first for durable
> storage (so data doesn't get lost in transmission) and other benefits.
>
> On Wed, Nov 16, 2016 at 2:12 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
>
>> Hi Karim,
>>
>> Are you looking for something specific? Some information about your
>> usecase would be really  helpful in order to answer your question.
>>
>>
>> On Wednesday, November 16, 2016, Karim, Md. Rezaul <
>> rezaul.karim@insight-centre.org> wrote:
>>
>>> Hi All,
>>>
>>> I am completely new with Kafka. I was wondering if somebody could
>>> provide me some guidelines on how to develop real-time streaming
>>> applications using Spark Streaming API with Kafka.
>>>
>>> I am aware the Spark Streaming  and Kafka integration [1]. However, a
>>> real life example should be better to start?
>>>
>>>
>>>
>>> 1. http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>> _________________________________
>>> *Md. Rezaul Karim* BSc, MSc
>>> PhD Researcher, INSIGHT Centre for Data Analytics
>>> National University of Ireland, Galway
>>> IDA Business Park, Dangan, Galway, Ireland
>>> Web: http://www.reza-analytics.eu/index.html
>>> <http://139.59.184.114/index.html>
>>>
>>
>>
>> --
>>
>>
>> [image: http://]
>>
>> Tariq, Mohammad
>> about.me/mti
>> [image: http://]
>> <http://about.me/mti>
>>
>>
>>
>

Re: Need guidelines in Spark Streaming and Kafka integration

Posted by Jon Gregg <jo...@gmail.com>.
Since you're completely new to Kafka, I would start with the Kafka docs (
https://kafka.apache.org/documentation).  You should be able to get through
the Getting Started part easily and there are some examples for setting up
a basic Kafka server.

You don't need Kafka to start working with Spark Streaming (there are
examples online to pull directly from Twitter, for example).  But at a high
level if you're sending data from one server to another, it can be
beneficial to send the messages to a distributed queue first for durable
storage (so data doesn't get lost in transmission) and other benefits.

On Wed, Nov 16, 2016 at 2:12 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hi Karim,
>
> Are you looking for something specific? Some information about your
> usecase would be really  helpful in order to answer your question.
>
>
> On Wednesday, November 16, 2016, Karim, Md. Rezaul <
> rezaul.karim@insight-centre.org> wrote:
>
>> Hi All,
>>
>> I am completely new with Kafka. I was wondering if somebody could provide
>> me some guidelines on how to develop real-time streaming applications using
>> Spark Streaming API with Kafka.
>>
>> I am aware the Spark Streaming  and Kafka integration [1]. However, a
>> real life example should be better to start?
>>
>>
>>
>> 1. http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
>>
>>
>>
>>
>>
>> Regards,
>> _________________________________
>> *Md. Rezaul Karim* BSc, MSc
>> PhD Researcher, INSIGHT Centre for Data Analytics
>> National University of Ireland, Galway
>> IDA Business Park, Dangan, Galway, Ireland
>> Web: http://www.reza-analytics.eu/index.html
>> <http://139.59.184.114/index.html>
>>
>
>
> --
>
>
> [image: http://]
>
> Tariq, Mohammad
> about.me/mti
> [image: http://]
> <http://about.me/mti>
>
>
>

Re: Need guidelines in Spark Streaming and Kafka integration

Posted by Mohammad Tariq <do...@gmail.com>.
Hi Karim,

Are you looking for something specific? Some information about your usecase
would be really  helpful in order to answer your question.

On Wednesday, November 16, 2016, Karim, Md. Rezaul <
rezaul.karim@insight-centre.org> wrote:

> Hi All,
>
> I am completely new with Kafka. I was wondering if somebody could provide
> me some guidelines on how to develop real-time streaming applications using
> Spark Streaming API with Kafka.
>
> I am aware the Spark Streaming  and Kafka integration [1]. However, a real
> life example should be better to start?
>
>
>
> 1. http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
>
>
>
>
>
> Regards,
> _________________________________
> *Md. Rezaul Karim* BSc, MSc
> PhD Researcher, INSIGHT Centre for Data Analytics
> National University of Ireland, Galway
> IDA Business Park, Dangan, Galway, Ireland
> Web: http://www.reza-analytics.eu/index.html
> <http://139.59.184.114/index.html>
>


-- 


[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]
<http://about.me/mti>